Hello,
Thank you all for your description. Actually, this is my first experiences
in an open source community. I downloaded the source code (lucene-3.0.0.zip)
and would like to work on part of the code in order to learn new skills from
group and have a positive contribution. To be honest, I really don't know
from which part I should start my work. Please let me know the exact address
of the source code you are discussing about (folder, file), then I will join
you :)



On Sat, Dec 5, 2009 at 2:10 PM, Robert Muir <rcm...@gmail.com> wrote:

> Hi Ghazal,
>
> I am sorry this one is a bit confusing. I think it is because a lot of
> people are working on it (which is great) and a lot of ideas going back and
> forth, causing lots of files to be uploaded, etc.
>
> Can you tell us more about your interest in working with NFA/DFA in Lucene?
>
> I am very curious to hear any uses cases you might have, or why you are
> interested!
>
> In general, for contributing to lucene this link is helpful:
> http://wiki.apache.org/lucene-java/HowToContribute
>
> It tells you how the patch submission process works, how to get the latest
> code from subversion, etc.
>
>
> On Sat, Dec 5, 2009 at 4:58 PM, Ghazal Gharooni <ghazal.gharo...@gmail.com
> > wrote:
>
>> Hello,
>>
>> I am new in the community and I've completely been confused. Please
>> anybody help me out to know which part of codes you are working with. How
>> should I participate in work? Thank you!
>>
>>
>>
>>
>>
>> On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <j...@apache.org>wrote:
>>
>>>
>>>     [
>>> https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>>
>>> Uwe Schindler updated LUCENE-1606:
>>> ----------------------------------
>>>
>>>     Attachment:     (was: LUCENE-1606-flex.patch)
>>>
>>> > Automaton Query/Filter (scalable regex)
>>> > ---------------------------------------
>>> >
>>> >                 Key: LUCENE-1606
>>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1606
>>> >             Project: Lucene - Java
>>> >          Issue Type: New Feature
>>> >          Components: Search
>>> >            Reporter: Robert Muir
>>> >            Assignee: Robert Muir
>>> >            Priority: Minor
>>> >             Fix For: 3.1
>>> >
>>> >         Attachments: automaton.patch, automatonMultiQuery.patch,
>>> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
>>> automatonWithWildCard.patch, automatonWithWildCard2.patch,
>>> BenchWildcard.java, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>>> LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>>> LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>> LUCENE-1606_nodep.patch
>>> >
>>> >
>>> > Attached is a patch for an AutomatonQuery/Filter (name can change if
>>> its not suitable).
>>> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
>>> large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes,
>>> etc. Additionally all of the existing RegexQuery implementations in Lucene
>>> are really slow if there is no constant prefix. This implementation does not
>>> depend upon constant prefix, and runs the same query in 640ms.
>>> > Some use cases I envision:
>>> >  1. lexicography/etc on large text corpora
>>> >  2. looking for things such as urls where the prefix is not constant
>>> (http:// or ftp://)
>>> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
>>> convert regular expressions into a DFA. Then, the filter "enumerates" terms
>>> in a special way, by using the underlying state machine. Here is my short
>>> description from the comments:
>>> >      The algorithm here is pretty basic. Enumerate terms but instead of
>>> a binary accept/reject do:
>>> >
>>> >      1. Look at the portion that is OK (did not enter a reject state in
>>> the DFA)
>>> >      2. Generate the next possible String and seek to that.
>>> > the Query simply wraps the filter with ConstantScoreQuery.
>>> > I did not include the automaton.jar inside the patch but it can be
>>> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

Reply via email to