[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Michael McCandless (JIRA) Mon, 27 Jun 2011 06:40:12 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055544#comment-13055544
 ]


Michael McCandless commented on LUCENE-1536:
--------------------------------------------

bq. My question: Do we really need to make the delDocs inverse in this issue?

I agree, let's break this (inverting delDocs/skipDocs) into a new issue and do 
it first, then come back to this issue.  There's still more work to do here, eg 
the bits should be stored inverted too (and the sparse encoding "flipped").

bq.  The method name getNotDeletedDocs() should also be getVisibleDocs() or 
similar [I don't like double negation].

+1 for getVisibleDocs -- I also don't like double negation!

bq. In general, reversing the delDocs might be a good idea, but we should do it 
separate and hard (not allow both variants implemented by IndexReader & Co.).

I agree it must be hard cutover -- no more getDelDocs, and getVisibleDocs is 
abstract in IR.

bq. About the impls: FieldCacheRangeFilter can also implement getBits() 
directly as FieldCache is random access. It should just return an own Bits impl 
for the DocIdSet that checks the filtering in get(index).

Ahh, right: FCRF has no trouble being random access, and it can re-use the 
already created matchDoc in the subclasses.

> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to