[jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache

Toke Eskildsen (JIRA) Tue, 23 Mar 2010 00:12:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848589#action_12848589
 ]


Toke Eskildsen commented on LUCENE-2335:
----------------------------------------

I can see that I messed up reading your previous answer, regarding stored 
fields. Let's just forget is as to not confuse the issue further.

As for facets, they are equivalent to sorting in the aspect that resolving the 
actual Strings can be delayed until the very end. I'll try and contain myself 
on the facet subject and focus on sorting though.

I have used some time tinkering with the problem of spanning multiple segments 
and it seems to me that the generation of a "global" list of sorted ordinals 
should be feasible without too much overhead. Basically we want to preserve 
sequential access as much as possible, so merging sorted ordinals from segments 
will benefit from a read-ahead cache. By letting the reader deliver ordinals by 
an iterater, it is free to implement such a cache when necessary. I envision 
the signature to be something like
{code}
Iterator<OrdinalTerm> getOrdinalTerms(
      String persistenceKey, Comparator<Object> comparator, String field,
      boolean collectDocIDs) throws IOException;
{code}
where OrdinalTerm contains ordinal, Term and docID.

The beauty of all this is that the mapping is from docID->sortedOrdinal index 
(which it has to be for fast comparison), so keeping the possibility of 
resolving the Strings after the sort (fillFields=true) is free in terms of 
storage space and processing time.

I hope to have a patch out soon for SegmentReader so that it is possible to 
perform a sorted search "the Lucene way" rather than the hack I use in my proof 
of concept. However, vacation starts friday...

> optimization: when sorting by field, if index has one segment and field 
> values are not needed, do not load String[] into field cache
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2335
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> Spinoff from java-dev thread "Sorting with little memory: A suggestion", 
> started by Toke Eskildsen.
> When sorting by SortField.STRING we currently ask FieldCache for a 
> StringIndex on that field.
> This can consumes tons of RAM, when the values are mostly unique (eg a title 
> field), as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false, 
> we don't need the String[] values, just the int[] ords.  If the app needs to 
> show the fields it can pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized 
> indexes.
> When fixing this we must take care to share the int[] ords if some queries do 
> fillFields=true and some =false... ie, FieldCache will be called twice and it 
> should share the int[] ords across those invocations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache

Reply via email to