Hello, Thank you all for your description. Actually, this is my first experiences in an open source community. I downloaded the source code (lucene-3.0.0.zip) and would like to work on part of the code in order to learn new skills from group and have a positive contribution. To be honest, I really don't know from which part I should start my work. Please let me know the exact address of the source code you are discussing about (folder, file), then I will join you :)
On Sat, Dec 5, 2009 at 2:10 PM, Robert Muir <rcm...@gmail.com> wrote: > Hi Ghazal, > > I am sorry this one is a bit confusing. I think it is because a lot of > people are working on it (which is great) and a lot of ideas going back and > forth, causing lots of files to be uploaded, etc. > > Can you tell us more about your interest in working with NFA/DFA in Lucene? > > I am very curious to hear any uses cases you might have, or why you are > interested! > > In general, for contributing to lucene this link is helpful: > http://wiki.apache.org/lucene-java/HowToContribute > > It tells you how the patch submission process works, how to get the latest > code from subversion, etc. > > > On Sat, Dec 5, 2009 at 4:58 PM, Ghazal Gharooni <ghazal.gharo...@gmail.com > > wrote: > >> Hello, >> >> I am new in the community and I've completely been confused. Please >> anybody help me out to know which part of codes you are working with. How >> should I participate in work? Thank you! >> >> >> >> >> >> On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <j...@apache.org>wrote: >> >>> >>> [ >>> https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >>> >>> Uwe Schindler updated LUCENE-1606: >>> ---------------------------------- >>> >>> Attachment: (was: LUCENE-1606-flex.patch) >>> >>> > Automaton Query/Filter (scalable regex) >>> > --------------------------------------- >>> > >>> > Key: LUCENE-1606 >>> > URL: https://issues.apache.org/jira/browse/LUCENE-1606 >>> > Project: Lucene - Java >>> > Issue Type: New Feature >>> > Components: Search >>> > Reporter: Robert Muir >>> > Assignee: Robert Muir >>> > Priority: Minor >>> > Fix For: 3.1 >>> > >>> > Attachments: automaton.patch, automatonMultiQuery.patch, >>> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, >>> automatonWithWildCard.patch, automatonWithWildCard2.patch, >>> BenchWildcard.java, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, >>> LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, >>> LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch, >>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >>> LUCENE-1606_nodep.patch >>> > >>> > >>> > Attached is a patch for an AutomatonQuery/Filter (name can change if >>> its not suitable). >>> > Whereas the out-of-box contrib RegexQuery is nice, I have some very >>> large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, >>> etc. Additionally all of the existing RegexQuery implementations in Lucene >>> are really slow if there is no constant prefix. This implementation does not >>> depend upon constant prefix, and runs the same query in 640ms. >>> > Some use cases I envision: >>> > 1. lexicography/etc on large text corpora >>> > 2. looking for things such as urls where the prefix is not constant >>> (http:// or ftp://) >>> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to >>> convert regular expressions into a DFA. Then, the filter "enumerates" terms >>> in a special way, by using the underlying state machine. Here is my short >>> description from the comments: >>> > The algorithm here is pretty basic. Enumerate terms but instead of >>> a binary accept/reject do: >>> > >>> > 1. Look at the portion that is OK (did not enter a reject state in >>> the DFA) >>> > 2. Generate the next possible String and seek to that. >>> > the Query simply wraps the filter with ConstantScoreQuery. >>> > I did not include the automaton.jar inside the patch but it can be >>> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. >>> >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >> > > > -- > Robert Muir > rcm...@gmail.com >