Could you be more specific :) This patch is part of an issue to add an AutomatonQuery class to Lucene that allows for a fast RegexpQuery and replaces our WildcardQuery impl. Its being developed in two flavors - one for the current trunk version of Lucene, and a slightly altered version for our "flexible indexing" branch - which is a branch where another large issue is being developed - eventually it will be merged back into trunk.
This might not be an issue where you want to get your feet wet ;) But if you could be more explicit with what you want to know, we might be able to be of more help. Thats a pretty broad question. To take a stab anyway: the short of it is - find an issue you find compelling and jump in ! :) Ghazal Gharooni wrote: > Hello, > > I am new in the community and I've completely been confused. Please > anybody help me out to know which part of codes you are working with. > How should I participate in work? Thank you! > > > > > On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <j...@apache.org > <mailto:j...@apache.org>> wrote: > > > [ > > https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Uwe Schindler updated LUCENE-1606: > ---------------------------------- > > Attachment: (was: LUCENE-1606-flex.patch) > > > Automaton Query/Filter (scalable regex) > > --------------------------------------- > > > > Key: LUCENE-1606 > > URL: > https://issues.apache.org/jira/browse/LUCENE-1606 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: Search > > Reporter: Robert Muir > > Assignee: Robert Muir > > Priority: Minor > > Fix For: 3.1 > > > > Attachments: automaton.patch, automatonMultiQuery.patch, > automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, > automatonWithWildCard.patch, automatonWithWildCard2.patch, > BenchWildcard.java, LUCENE-1606-flex.patch, > LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, > LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, > LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch, > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606_nodep.patch > > > > > > Attached is a patch for an AutomatonQuery/Filter (name can > change if its not suitable). > > Whereas the out-of-box contrib RegexQuery is nice, I have some > very large indexes (100M+ unique tokens) where queries are quite > slow, 2 minutes, etc. Additionally all of the existing RegexQuery > implementations in Lucene are really slow if there is no constant > prefix. This implementation does not depend upon constant prefix, > and runs the same query in 640ms. > > Some use cases I envision: > > 1. lexicography/etc on large text corpora > > 2. looking for things such as urls where the prefix is not > constant (http:// or ftp://) > > The Filter uses the BRICS package > (http://www.brics.dk/automaton/) to convert regular expressions > into a DFA. Then, the filter "enumerates" terms in a special way, > by using the underlying state machine. Here is my short > description from the comments: > > The algorithm here is pretty basic. Enumerate terms but > instead of a binary accept/reject do: > > > > 1. Look at the portion that is OK (did not enter a reject > state in the DFA) > > 2. Generate the next possible String and seek to that. > > the Query simply wraps the filter with ConstantScoreQuery. > > I did not include the automaton.jar inside the patch but it can > be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > For additional commands, e-mail: java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org