[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853007#action_12853007 ]
Robert Muir commented on LUCENE-2265: ------------------------------------- {quote} So? You aren't making some generic automaton library, are you? Get these user regexes/wildcards/etc, convert them to utf-8, build utf-8 automaton, run it against lucene data. {quote} This just pushes the complexity into the parsers. and yes, it makes sense to support high-level (char[]) operations with automaton too, such as analysis. I encourage you to take a look at the existing code. In general a lot of parsers (see wildcard and regex) are implemented with primitive automata like 'makeAnyChar'. 'makeAnyByte' makes no sense. So its generic in the sense that fuzzy, regex, wildcard, all of our users are defined on unicode characters. high level operations such as parsing, intersection, and union belong in utf16 or utf32 space, not with bytes. bytes is an implementation detail, and we shouldnt operate on UTF-8 except behind the scenes. > improve automaton performance by running on byte[] > -------------------------------------------------- > > Key: LUCENE-2265 > URL: https://issues.apache.org/jira/browse/LUCENE-2265 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: Flex Branch > Reporter: Robert Muir > Priority: Minor > Fix For: Flex Branch > > Attachments: LUCENE-2265.patch > > > Currently, when enumerating terms, automaton must convert entire terms from > flex's native utf-8 byte[] to char[] first, then step each char thru the > state machine. > we can make this more efficient, by allowing the state machine to run on > byte[], so it can return true/false faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org