Toke Eskildsen created SOLR-11240:
-------------------------------------
Summary: Raise UnInvertedField internal limit
Key: SOLR-11240
URL: https://issues.apache.org/jira/browse/SOLR-11240
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: faceting
Affects Versions: 6.6, 5.5.4, master (8.0)
Reporter: Toke Eskildsen
Assignee: Toke Eskildsen
Priority: Minor
Fix For: master (8.0), 6.6, 5.5.4
{{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24
bytes for byte-arrays holding term ordinals. For String faceting on
high-cardinality Text fields, this can trigger the exception with "Too many
values for UnInvertedField". A search for that phrase shows that the exception
is encountered in the wild.
The limitation is due to the packing being a combination of values and
pointers: If the values (term ordinals) for a given document-ID can fit in an
integer, they are stored directly. If the value of the first 8 bits in the
integer is 1, it signals that the following 3 bytes (24 bits) is a pointer into
a byte-array, limiting the array-size to 16M (2^24).
Solution: Due to the values being packed at vInts, bit 31 (the last bit) of the
integer will never be 1 if the integer contains values. This means that this
bit it can be used for signalling whether or not the preceding bits should be
parsed as values or a pointer. The effective pointer size is thus 2^31, which
matches the array-length limit in Java. Changing the signalling mechanism does
not affect space requirements and should not affect performance.
Note that this is only a 100-fold increase ever the 2^24 limit, not an
elimination: Performing uninverted Text field faceting on 100M documents with
5K terms each will still raise an exception.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]