[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162158#comment-13162158
 ] 

Carlos González-Cadenas commented on LUCENE-3298:
-------------------------------------------------

We've also tried to remove some Outputs to see how the outputs were affecting 
the total automaton size, but the difference is not too much. So it seems that 
the size is mostly related to the huge number of sentences. 

I said before that the sentences are quite repetitive, but to be more precise, 
some prefixes of the sentence are quite repetitive. 

hotels with jacuzzi in barcelona
hotels with jacuzzi in madrid
hotels with jacuzzi in berlin
...

We tested at the beginning with the FST visualization tool and it seemed to do 
a good job (i.e. placing the outputs in the right nodes and leveraging shared 
prefixes).

                
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>
>                 Key: LUCENE-3298
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3298
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to