[ https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576482#comment-13576482 ]
Shai Erera commented on LUCENE-4764: ------------------------------------ Facets42Codec has a nocommit about handling multiple category lists as well as if the default field has changed. Currently (in the patch), it hard-codes to "$facets", but that won't work if e.g. the app indexed categories into a different field. Talking with Mike about it yesterday, I thought that what needs to be done is for the codec to receive the FacetIndexingParams, build a HashSet<String> of all fields that hold facets, and then use it in .getDocValuesFormatForField. However, I realized later that this is not doable, since Codecs must have a default constructor, and b/c of how they are initialized, they cannot rely on stuff passed to them in the ctor (e.g. when they are initialized by a reader?). Is that true? I looked at few Codecs impl, and looks like none relies on stuff passed to it in the ctor. If so, perhaps we should also override the FieldInfosFormat and use it to detect which fields are "facet" fields? E.g. it will be a subset of all fields that have BinaryDV. But that's not distinguishing enough ... and we cannot add a DVType, so cannot distinguish BINARY from FACETS_BINARY even if we wanted to make a different BinaryDV extension ... Crazy, but can we write a boolean to FieldInfo {{hasFacets}}? Is it supported if we e.g. extend (I realize, many) classes? > Faster but more RAM/Disk consuming DocValuesFormat for facets > ------------------------------------------------------------- > > Key: LUCENE-4764 > URL: https://issues.apache.org/jira/browse/LUCENE-4764 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4764.patch > > > The new default DV format for binary fields has much more > RAM-efficient encoding of the address for each document ... but it's > also a bit slower at decode time, which affects facets because we > decode for every collected docID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org