[
https://issues.apache.org/jira/browse/LUCENE-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672215#comment-13672215
]
Robert Muir commented on LUCENE-5025:
-------------------------------------
I think that bug is in PagedGrowableWriter.resize().
Currently, if you resize one where the last page isnt full, you try to access
values that dont exist,
because it just copies pagesize() values.
I think instead it should be (something like):
{code}
for (int i = 0; i < numCommonPages; ++i) {
...
final int valuesToCopy;
// if its the last page, it might be sized down
if (i == subWriters.length - 1) {
valuesToCopy = (int)(size() % pageSize());
} else {
valuesToCopy = pageSize();
}
PackedInts.copy(subWriters[i], 0, newWriter.subWriters[i], 0,
valuesToCopy, copyBuffer);
}
{code}
> Allow more than 2.1B "tail nodes" when building FST
> ---------------------------------------------------
>
> Key: LUCENE-5025
> URL: https://issues.apache.org/jira/browse/LUCENE-5025
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/FSTs
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.4
>
> Attachments: LUCENE-5025.patch
>
>
> We recently relaxed some of the limits for big FSTs, but there is
> one more limit I think we should fix. E.g. Aaron hit it in building
> the world's biggest FST:
> http://aaron.blog.archive.org/2013/05/29/worlds-biggest-fst/
> The issue is NodeHash, which currently uses a GrowableWriter (packed
> ints impl that can grow both number of bits and number of values):
> it's indexed by int not long.
> This is a hash table that's used to share suffixes, so we need random
> get/put on a long index of long values, i.e. this is logically a long[].
> I think one simple way to do this is to make a "paged"
> GrowableWriter...
> Along with this we'd need to fix the hash codes to be long not
> int.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]