[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

Michael McCandless (JIRA) Sun, 18 Jan 2009 03:13:27 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664948#action_12664948
 ]


Michael McCandless commented on LUCENE-1483:
--------------------------------------------


bq. As we call next on MultiTermDocs it will get a TermDocs for each Reader and 
call seek to get to the Term. The seek appears pretty slow, and we do it for 
the number of Readers x the number of Terms to be loaded.

Right -- the uninverting we do to populate the FieldCache is very
costly through MultiReader for fields that are mostly unique String
(eg a title field, or a "primary key" id field, etc.).

Enum type fields (like country) don't have this problem (1.0 sec vs
0.6 sec to populate FieldCache through MultiReader for the 100 segment
index).

But, with this change, we sidestep this problem for Lucene's core, but
for apps that directly load FieldCache for the MultiReader the problem
is still there.

Once we have column stride fields (LUCENE-1231) it should then be far
faster to load the FieldCache for unique String fields.

bq. While there is a big difference between searching a single segment vs 
multisegments for these things, we already knew about that - thats why you 
optimize.

Right, but for realtime search you don't have the luxury of
optimizing.  This patch makes warming time after reopen much faster
for a many-segment index for apps that use FieldCache with mostly unique String
fields.


> Change IndexSearcher multisegment searches to search each individual segment 
> using a single HitCollector
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

Reply via email to