[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Shai Erera (JIRA) Tue, 12 Feb 2013 01:23:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576482#comment-13576482
 ]


Shai Erera commented on LUCENE-4764:
------------------------------------

Facets42Codec has a nocommit about handling multiple category lists as well as 
if the default field has changed. Currently (in the patch), it hard-codes to 
"$facets", but that won't work if e.g. the app indexed categories into a 
different field.

Talking with Mike about it yesterday, I thought that what needs to be done is 
for the codec to receive the FacetIndexingParams, build a HashSet<String> of 
all fields that hold facets, and then use it in .getDocValuesFormatForField.

However, I realized later that this is not doable, since Codecs must have a 
default constructor, and b/c of how they are initialized, they cannot rely on 
stuff passed to them in the ctor (e.g. when they are initialized by a reader?). 
Is that true? I looked at few Codecs impl, and looks like none relies on stuff 
passed to it in the ctor.

If so, perhaps we should also override the FieldInfosFormat and use it to 
detect which fields are "facet" fields? E.g. it will be a subset of all fields 
that have BinaryDV. But that's not distinguishing enough ... and we cannot add 
a DVType, so cannot distinguish BINARY from FACETS_BINARY even if we wanted to 
make a different BinaryDV extension ...

Crazy, but can we write a boolean to FieldInfo {{hasFacets}}? Is it supported 
if we e.g. extend (I realize, many) classes?
                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets

Reply via email to