[
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164531#comment-13164531
]
James Dyer commented on LUCENE-3298:
------------------------------------
Carlos,
I'm not sure how much help this is, but you might be able to eke a little bit
of performance if you can tighten RewritablePagedBytes.copyBytes(). You'll
note it currently moves the From-Bytes into a temp array then writes that back
to the fst an the To-Bytes location. Note also, the one place this gets
called, it used to be a simple "System.ArrayCopy". So if you can make it copy
in-place that might claw back the performance loss a little. Beyond this, a
different pair of eyes might find more ways to optimize. In the end though you
will likely never make it perform quite as well as the simple array.
Also, it sounds as if you've maybe done work to sync this with the current
trunk. If so, would you mind uploading the updated patch?
Also if you end up using this, be sure to test thoroughly. I implemented this
one just to gain a little familiarity with the code and I do not claim any sort
of expertise in this area, so beware! But all of the regular unit tests did
pass for me. I was meaning to try to run test2bpostings against this but
wasn't able to get it set up. If I remember this issue came up originally
because someone wanted to run test2bpostings with memorycodec and it was going
passed the limit.
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/FSTs
> Reporter: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is
> indexed by int so we cannot grow this over Integer.MAX_VALUE. It also
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]