[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051725#comment-13051725
 ] 

Dawid Weiss commented on LUCENE-3206:
-------------------------------------

UTF32 is basically codepoint representation, so there are no surrogates (as in 
UTF16) and there is no special encoding of higher codepoints (as in UTF8). I 
don't know what sort order is used inside Lucene (is it UTF8 byte-to-byte 
values or decoded codepoints?). If it is codepoint order then no problem -- 
this should be preserved.

I'll stick to BYTE1/BYTE4 inputs then for now and I'll try to push this patch 
forward in my spare time.

> FST package API refactoring
> ---------------------------
>
>                 Key: LUCENE-3206
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3206
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>    Affects Versions: 3.2
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to