[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051684#comment-13051684
 ] 

Michael McCandless commented on LUCENE-3206:
--------------------------------------------

OK, these results make sense!  UTF32 (vInt labels) is more compact than UTF8, 
if you disable array'd arcs.  These wiki terms are from the en export right?  
So the differences are due to the smallish number of random terms that are not 
English... it should be more extreme if we used non-English content.

I wonder how lookup time would compare... I think UTF32 should be faster?

And yes for truly binary terms (eg collated fields, and maybe eventually 
numeric fields but not yet because they still avoid the 8th bit I think) I 
think we want to keep BYTE1.

We need some good use cases of FSTs during analysis... there we are free to 
make the alphabet non-byte (vs the index, where terms are a BytesRef).

> FST package API refactoring
> ---------------------------
>
>                 Key: LUCENE-3206
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3206
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.2
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to