Re: Big number of values for facets

Michael McCandless Fri, 26 Apr 2013 09:00:45 -0700

This means a single document requires more than 32 KB to store all of
its ordinals ... so that document must have like at least 6K facets?


Are you sure this isn't a bug in your app?  That's an insanely high
number of facets for one document ...
Mike McCandless

http://blog.mikemccandless.com


On Fri, Apr 26, 2013 at 11:22 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> Hi Shai,
>
> I can't say now how many of these entries I have, I need to trace them,
> but I expect their are exceptions, like 10 entries no more.
>
> Can I enable partitions document by document? Should I activate
> partitions if I reach a threshold just for these exceptions?
>
>
> Nicola.
>
> On Fri, 2013-04-26 at 18:04 +0300, Shai Erera wrote:
>> Hi Nicola,
>>
>> I think this limit denotes the number of bytes you can write in a single DV
>> value. So this actually means much less number of facets you index. Do you
>> know how many categories are indexed for that one document?
>>
>> Also, do you expect to index large number of facets for most documents, or
>> is this one extreme example?
>>
>> Basically I think you can achieve that by enabling partitions. Partitions
>> let you split the categories space into smaller sets, so that each DV value
>> contains less values, and also the RAM consumption during search is lower
>> since FacetArrays is allocated the size of the partition and not the
>> taxonomy. But you also incur search performance loss because counting a
>> certain dimension requires traversing multiple DV fields.
>>
>> To enable partitions you need to override FacetIndexingParams partition
>> size. You can try to play with it.
>>
>> In am intetested though to understand the general scenario. Perhaps this
>> can be solved some other way...
>>
>> Shai
>> On Apr 26, 2013 5:44 PM, "Nicola Buso" <nb...@ebi.ac.uk> wrote:
>>
>> > Hi all,
>> >
>> > I'm encountering a problem to index a document with a large number of
>> > values for one facet.
>> >
>> > Caused by: java.lang.IllegalArgumentException: DocValuesField "$facets"
>> > is too large, must be <= 32766
>> >         at
>> >
>> > org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:57)
>> >         at
>> >
>> > org.apache.lucene.index.DocValuesProcessor.addBinaryField(DocValuesProcessor.java:111)
>> >         at
>> >
>> > org.apache.lucene.index.DocValuesProcessor.addField(DocValuesProcessor.java:57)
>> >         at
>> >
>> > org.apache.lucene.index.TwoStoredFieldsConsumers.addField(TwoStoredFieldsConsumers.java:36)
>> >         at
>> >
>> > org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:242)
>> >         at
>> >
>> > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>> >         at
>> >
>> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>> >         at
>> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>> >
>> >
>> > It's obviously hard to visualize such a big number of facets to the user
>> > and is also hard to evaluate which of these values to skip to permit to
>> > store this document into the index.
>> >
>> > Do you have any suggestion on how to overcome this number? is it
>> > possible?
>> >
>> >
>> >
>> > Nicola
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Big number of values for facets

Reply via email to