Why do you hate 1.6 Uwe? Mike
On Mon, Jun 15, 2009 at 12:10 PM, Uwe Schindler (JIRA)<j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Uwe Schindler updated LUCENE-1606: > ---------------------------------- > > Fix Version/s: (was: 2.9) > 3.0 > > I move this to 3.0 (and not 3.1), because it can be released together with > 3.0 (contrib modules do not need to wait until 3.1). > > Robert: you could supply a patch with StringBuilder toString() variants and > all those @Override uncommented-in. And it works correct with 1.5 (I am > working with 1.5 here locally - I hate 1.6...). > >> Automaton Query/Filter (scalable regex) >> --------------------------------------- >> >> Key: LUCENE-1606 >> URL: https://issues.apache.org/jira/browse/LUCENE-1606 >> Project: Lucene - Java >> Issue Type: New Feature >> Components: contrib/* >> Reporter: Robert Muir >> Assignee: Uwe Schindler >> Priority: Minor >> Fix For: 3.0 >> >> Attachments: automaton.patch, automatonMultiQuery.patch, >> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, >> automatonWithWildCard.patch, automatonWithWildCard2.patch, LUCENE-1606.patch >> >> >> Attached is a patch for an AutomatonQuery/Filter (name can change if its not >> suitable). >> Whereas the out-of-box contrib RegexQuery is nice, I have some very large >> indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. >> Additionally all of the existing RegexQuery implementations in Lucene are >> really slow if there is no constant prefix. This implementation does not >> depend upon constant prefix, and runs the same query in 640ms. >> Some use cases I envision: >> 1. lexicography/etc on large text corpora >> 2. looking for things such as urls where the prefix is not constant >> (http:// or ftp://) >> The Filter uses the BRICS package (http://www.brics.dk/automaton/) to >> convert regular expressions into a DFA. Then, the filter "enumerates" terms >> in a special way, by using the underlying state machine. Here is my short >> description from the comments: >> The algorithm here is pretty basic. Enumerate terms but instead of a >> binary accept/reject do: >> >> 1. Look at the portion that is OK (did not enter a reject state in the >> DFA) >> 2. Generate the next possible String and seek to that. >> the Query simply wraps the filter with ConstantScoreQuery. >> I did not include the automaton.jar inside the patch but it can be >> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org