[
https://issues.apache.org/jira/browse/LUCENE-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848589#action_12848589
]
Toke Eskildsen commented on LUCENE-2335:
----------------------------------------
I can see that I messed up reading your previous answer, regarding stored
fields. Let's just forget is as to not confuse the issue further.
As for facets, they are equivalent to sorting in the aspect that resolving the
actual Strings can be delayed until the very end. I'll try and contain myself
on the facet subject and focus on sorting though.
I have used some time tinkering with the problem of spanning multiple segments
and it seems to me that the generation of a "global" list of sorted ordinals
should be feasible without too much overhead. Basically we want to preserve
sequential access as much as possible, so merging sorted ordinals from segments
will benefit from a read-ahead cache. By letting the reader deliver ordinals by
an iterater, it is free to implement such a cache when necessary. I envision
the signature to be something like
{code}
Iterator<OrdinalTerm> getOrdinalTerms(
String persistenceKey, Comparator<Object> comparator, String field,
boolean collectDocIDs) throws IOException;
{code}
where OrdinalTerm contains ordinal, Term and docID.
The beauty of all this is that the mapping is from docID->sortedOrdinal index
(which it has to be for fast comparison), so keeping the possibility of
resolving the Strings after the sort (fillFields=true) is free in terms of
storage space and processing time.
I hope to have a patch out soon for SegmentReader so that it is possible to
perform a sorted search "the Lucene way" rather than the hack I use in my proof
of concept. However, vacation starts friday...
> optimization: when sorting by field, if index has one segment and field
> values are not needed, do not load String[] into field cache
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2335
> URL: https://issues.apache.org/jira/browse/LUCENE-2335
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
>
> Spinoff from java-dev thread "Sorting with little memory: A suggestion",
> started by Toke Eskildsen.
> When sorting by SortField.STRING we currently ask FieldCache for a
> StringIndex on that field.
> This can consumes tons of RAM, when the values are mostly unique (eg a title
> field), as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false,
> we don't need the String[] values, just the int[] ords. If the app needs to
> show the fields it can pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized
> indexes.
> When fixing this we must take care to share the int[] ords if some queries do
> fillFields=true and some =false... ie, FieldCache will be called twice and it
> should share the int[] ords across those invocations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]