Re: Big number of values for facets

Shai Erera Fri, 26 Apr 2013 08:45:25 -0700

Unfortunately partitions are enabled globally and not per document. And you
cannot activate them as you go. It's a setting you need to enable before
you index. At least, that's how they currently work - we can think of
better ways to do it.

Also, partitions were not designed to handle that limitation, but rather
better RAM consumption for large taxonomies. Ie when facets were on
payloads, we didn't have that limitation, and frankly, I didn't know DV
limits you at all...

The problem is that even if you choose to enable partitions, you need to
determine a safe partition size to use. Eg if you have a total of 1M
categories and you set partition size to 100K, 10 DV fields will be
created. But there's no guarantee a single document's categories space
won't fall entirely into one partition... In which case you'll want to set
partition size to say 5K, but then you'll have 200 DV fields to process
during search - bad performance!

I'm not near the code at the moment, but I think that partitions are
enabled globally to all category lists. Perhaps we can modify the code to
apply partitions per CLP. That way, you can index just the problematic
dimension in a different category list so that only that dimension suffers
during search but the rest are processed regularly?

Still, can you share some info about this dimension? What sort of
categories does it cover that docs have thousands values?

The reason I ask is that the only scenario I've seen where partitions came
in handy was IMO an abuse of the fact module ... :-)

Shai
On Apr 26, 2013 6:04 PM, "Shai Erera" <[email protected]> wrote:

> Hi Nicola,
>
> I think this limit denotes the number of bytes you can write in a single
> DV value. So this actually means much less number of facets you index. Do
> you know how many categories are indexed for that one document?
>
> Also, do you expect to index large number of facets for most documents, or
> is this one extreme example?
>
> Basically I think you can achieve that by enabling partitions. Partitions
> let you split the categories space into smaller sets, so that each DV value
> contains less values, and also the RAM consumption during search is lower
> since FacetArrays is allocated the size of the partition and not the
> taxonomy. But you also incur search performance loss because counting a
> certain dimension requires traversing multiple DV fields.
>
> To enable partitions you need to override FacetIndexingParams partition
> size. You can try to play with it.
>
> In am intetested though to understand the general scenario. Perhaps this
> can be solved some other way...
>
> Shai
> On Apr 26, 2013 5:44 PM, "Nicola Buso" <[email protected]> wrote:
>
>> Hi all,
>>
>> I'm encountering a problem to index a document with a large number of
>> values for one facet.
>>
>> Caused by: java.lang.IllegalArgumentException: DocValuesField "$facets"
>> is too large, must be <= 32766
>>         at
>>
>> org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:57)
>>         at
>>
>> org.apache.lucene.index.DocValuesProcessor.addBinaryField(DocValuesProcessor.java:111)
>>         at
>>
>> org.apache.lucene.index.DocValuesProcessor.addField(DocValuesProcessor.java:57)
>>         at
>>
>> org.apache.lucene.index.TwoStoredFieldsConsumers.addField(TwoStoredFieldsConsumers.java:36)
>>         at
>>
>> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:242)
>>         at
>>
>> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>>         at
>>
>> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>>         at
>> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>>
>>
>> It's obviously hard to visualize such a big number of facets to the user
>> and is also hard to evaluate which of these values to skip to permit to
>> store this document into the index.
>>
>> Do you have any suggestion on how to overcome this number? is it
>> possible?
>>
>>
>>
>> Nicola
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Re: Big number of values for facets

Reply via email to