[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164531#comment-13164531
 ] 

James Dyer commented on LUCENE-3298:
------------------------------------

Carlos,

I'm not sure how much help this is, but you might be able to eke a little bit 
of performance if you can tighten RewritablePagedBytes.copyBytes().  You'll 
note it currently moves the From-Bytes into a temp array then writes that back 
to the fst an the To-Bytes location.  Note also, the one place this gets 
called, it used to be a simple "System.ArrayCopy".  So if you can make it copy 
in-place that might claw back the performance loss a little.  Beyond this, a 
different pair of eyes might find more ways to optimize.  In the end though you 
will likely never make it perform quite as well as the simple array.

Also, it sounds as if you've maybe done work to sync this with the current 
trunk.  If so, would you mind uploading the updated patch?

Also if you end up using this, be sure to test thoroughly.  I implemented this 
one just to gain a little familiarity with the code and I do not claim any sort 
of expertise in this area, so beware!  But all of the regular unit tests did 
pass for me.  I was meaning to try to run test2bpostings against this but 
wasn't able to get it set up.  If I remember this issue came up originally 
because someone wanted to run test2bpostings with memorycodec and it was going 
passed the limit.
                
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>
>                 Key: LUCENE-3298
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3298
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to