[
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164479#comment-13164479
]
Carlos González-Cadenas commented on LUCENE-3298:
-------------------------------------------------
Thanks for the presentation. It's very interesting.
Now that we've invested very significant time with this approach, we'd like to
stick a little bit more with it and see where we can get to. The FST approach,
given that is way more low level, will give us more control of the
functionality down the road, which definitely will prove benefitial mid-term.
If needed due to space requirements, we can think of replacing FST by LZTrie if
we need more infix compression for the permutations.
Re: next steps, you commented above that you may consider including this patch
into the codebase when you have people that have the need. We obviously would
be very interested in this patch getting into trunk.
In terms of performance, James is speaking about a 20% performance loss in a
32-bit machine, we're seeing less performance degradation in a 64-bit machine,
something around 10-15% depending on the specific FST and query. If you or
James envision any way to optimize it, let me know, we can give a hand here if
you tell us the potential paths to make it more efficient.
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/FSTs
> Reporter: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is
> indexed by int so we cannot grow this over Integer.MAX_VALUE. It also
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]