[ 
https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853007#action_12853007
 ] 

Robert Muir commented on LUCENE-2265:
-------------------------------------

{quote}
So? You aren't making some generic automaton library, are you?
Get these user regexes/wildcards/etc, convert them to utf-8, build utf-8 
automaton, run it against lucene data. 
{quote}

This just pushes the complexity into the parsers. and yes, it makes sense to 
support high-level (char[]) operations
with automaton too, such as analysis.

I encourage you to take a look at the existing code. In general a lot of 
parsers (see wildcard and regex) are implemented 
with primitive automata like 'makeAnyChar'. 'makeAnyByte' makes no sense.

So its generic in the sense that fuzzy, regex, wildcard, all of our users are 
defined on unicode characters. high
level operations such as parsing, intersection, and union belong in utf16 or 
utf32 space, not with bytes.

bytes is an implementation detail, and we shouldnt operate on UTF-8 except 
behind the scenes.

> improve automaton performance by running on byte[]
> --------------------------------------------------
>
>                 Key: LUCENE-2265
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2265
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: Flex Branch
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: LUCENE-2265.patch
>
>
> Currently, when enumerating terms, automaton must convert entire terms from 
> flex's native utf-8 byte[] to char[] first, then step each char thru the 
> state machine.
> we can make this more efficient, by allowing the state machine to run on 
> byte[], so it can return true/false faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to