[ https://issues.apache.org/jira/browse/LUCENE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2265: -------------------------------- Attachment: LUCENE-2265.patch ok I think i made some serious progress here, but i did find a bug in the utf32 -> utf8 dfa convertor. The problem is it does not handle at least the case where the initial state is an accept state. I created a testcase for this (TestUTF32SpecialCase), and included the python code back, as i figure you will probably fix it there first. I deleted the surrogate-seeking tests, like other nuances, if we switch to byte[] these won't behave the same, as these regexps are no longer defined. remaining is to switch the slow fuzzy to use codepoint calculations (to be consistent with the fast one). by the way, its really silly we have to unicode-convert just to get length in chars for that score calculation... ugh! > improve automaton performance by running on byte[] > -------------------------------------------------- > > Key: LUCENE-2265 > URL: https://issues.apache.org/jira/browse/LUCENE-2265 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: Flex Branch > Reporter: Robert Muir > Priority: Minor > Fix For: Flex Branch > > Attachments: LUCENE-2265.patch, LUCENE-2265.patch, LUCENE-2265.patch, > LUCENE-2265.patch, LUCENE-2265.patch, LUCENE-2265_pare.patch, > LUCENE-2265_utf32.patch > > > Currently, when enumerating terms, automaton must convert entire terms from > flex's native utf-8 byte[] to char[] first, then step each char thru the > state machine. > we can make this more efficient, by allowing the state machine to run on > byte[], so it can return true/false faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org