[ https://issues.apache.org/jira/browse/SOLR-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136777#comment-16136777 ]
ASF subversion and git services commented on SOLR-11240: -------------------------------------------------------- Commit 85b89d15a89802d3bf6fbeac6bd55286028dc8e0 in lucene-solr's branch refs/heads/master from [~toke] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=85b89d1 ] SOLR-11240: Raise UnInvertedField internal limit > Raise UnInvertedField internal limit > ------------------------------------ > > Key: SOLR-11240 > URL: https://issues.apache.org/jira/browse/SOLR-11240 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: faceting > Affects Versions: 5.5.4, 6.6 > Reporter: Toke Eskildsen > Assignee: Toke Eskildsen > Priority: Minor > Labels: easyfix > Fix For: master (8.0) > > Attachments: SOLR-11240.patch, SOLR-11240.patch, SOLR-11240.patch > > > {{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24 > bytes for byte-arrays holding term ordinals. For String faceting on > high-cardinality Text fields, this can trigger the exception with "Too many > values for UnInvertedField". A search for that phrase shows that the > exception is encountered in the wild. > The limitation is due to the packing being a combination of values and > pointers: If the values (term ordinals) for a given document-ID can fit in an > integer, they are stored directly. If the value of the first 8 bits in the > integer is 1, it signals that the following 3 bytes (24 bits) is a pointer > into a byte-array, limiting the array-size to 16M (2^24). > Solution: Due to the values being packed at vInts, bit 31 (the last bit) of > the integer will never be 1 if the integer contains values. This means that > this bit it can be used for signalling whether or not the preceding bits > should be parsed as values or a pointer. The effective pointer size is thus > 2^31, which matches the array-length limit in Java. Changing the signalling > mechanism does not affect space requirements and should not affect > performance. > Note that this is only a 100-fold increase ever the 2^24 limit, not an > elimination: Performing uninverted Text field faceting on 100M documents with > 5K terms each will still raise an exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org