[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051725#comment-13051725 ]
Dawid Weiss commented on LUCENE-3206: ------------------------------------- UTF32 is basically codepoint representation, so there are no surrogates (as in UTF16) and there is no special encoding of higher codepoints (as in UTF8). I don't know what sort order is used inside Lucene (is it UTF8 byte-to-byte values or decoded codepoints?). If it is codepoint order then no problem -- this should be preserved. I'll stick to BYTE1/BYTE4 inputs then for now and I'll try to push this patch forward in my spare time. > FST package API refactoring > --------------------------- > > Key: LUCENE-3206 > URL: https://issues.apache.org/jira/browse/LUCENE-3206 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs > Affects Versions: 3.2 > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3206.patch > > > The current API is still marked @experimental, so I think there's still time > to fiddle with it. I've been using the current API for some time and I do > have some ideas for improvement. This is a placeholder for these -- I'll post > a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org