[
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051475#comment-13051475
]
Dawid Weiss commented on LUCENE-3206:
-------------------------------------
bq. this could be a non-negligible increase in FST size for the non-ascii case
I think?
I don't know. If the non-ASCII is encoded as UTF8 for the BytesRef, then
storing full unicode points on transitions shouldn't really account for much
more (in fact it may create fewer states/ transitions because multibyte UTF8
sequences will require multiple transitions)? This we would need to check, of
course. And I assume input sequences ARE text, which in general may not be the
case... I think I'll leave BYTE1/BYTE4 an option for now and see if I can
improve on it once I have a working test suite.
bq. I think SimpleText codec is a good example? Also
VariableGapTermsIndexReader, and MemoryCodec? Each of these use the
BytesRefFSTEnum, I believe.
I wasn't clear -- I can find the places where they're used, but I wanted to
clarify the nature of stored keys and values (are they UTF8 text, utf16,
unicode, random bytes)? I can go through the code, but you're probably a faster
source of information on this one. Robert, if you're reading this -- anything
you envision could be stored as transition labels?
> FST package API refactoring
> ---------------------------
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/FSTs
> Affects Versions: 3.2
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time
> to fiddle with it. I've been using the current API for some time and I do
> have some ideas for improvement. This is a placeholder for these -- I'll post
> a patch once I have a working proof of concept.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]