Hi Ghazal, I think if you are looking to help with Lucene in general, the
HowToContribute link is the best place to start:
http://wiki.apache.org/lucene-java/HowToContribute

We are not working with the source code from the zip file, but instead the
latest unreleased code in the subversion repository. There are instructions
on that page on how you can access it.

I agree with Mark this might be a tricky one to attack as your first issue,
perhaps you want to tackle something smaller to get used to the process of
how things work?

Also keep in mind, you can contribute in more ways than actually writing the
code, you can always contribute by providing comments or feedback,
suggesting improvements to the documentation or tests, answering questions
on the user list, etc.

Finally, as I mentioned before, if you are interested in this particular
issue for some reason, I think even telling us more information such as "I
am trying to run regular expressions/wildcard/fuzzy on a large index" or
something like that, would be helpful.

On Sat, Dec 5, 2009 at 6:04 PM, Ghazal Gharooni
<ghazal.gharo...@gmail.com>wrote:

> Hello,
> Thank you all for your description. Actually, this is my first experiences
> in an open source community. I downloaded the source code (lucene-3.0.0.zip)
> and would like to work on part of the code in order to learn new skills from
> group and have a positive contribution. To be honest, I really don't know
> from which part I should start my work. Please let me know the exact address
> of the source code you are discussing about (folder, file), then I will join
> you :)
>
>
>
>
> On Sat, Dec 5, 2009 at 2:10 PM, Robert Muir <rcm...@gmail.com> wrote:
>
>> Hi Ghazal,
>>
>> I am sorry this one is a bit confusing. I think it is because a lot of
>> people are working on it (which is great) and a lot of ideas going back and
>> forth, causing lots of files to be uploaded, etc.
>>
>> Can you tell us more about your interest in working with NFA/DFA in
>> Lucene?
>> I am very curious to hear any uses cases you might have, or why you are
>> interested!
>>
>> In general, for contributing to lucene this link is helpful:
>> http://wiki.apache.org/lucene-java/HowToContribute
>>
>> It tells you how the patch submission process works, how to get the latest
>> code from subversion, etc.
>>
>>
>> On Sat, Dec 5, 2009 at 4:58 PM, Ghazal Gharooni <
>> ghazal.gharo...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am new in the community and I've completely been confused. Please
>>> anybody help me out to know which part of codes you are working with. How
>>> should I participate in work? Thank you!
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <j...@apache.org>wrote:
>>>
>>>>
>>>>     [
>>>> https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>>>
>>>> Uwe Schindler updated LUCENE-1606:
>>>> ----------------------------------
>>>>
>>>>     Attachment:     (was: LUCENE-1606-flex.patch)
>>>>
>>>> > Automaton Query/Filter (scalable regex)
>>>> > ---------------------------------------
>>>> >
>>>> >                 Key: LUCENE-1606
>>>> >                 URL:
>>>> https://issues.apache.org/jira/browse/LUCENE-1606
>>>> >             Project: Lucene - Java
>>>> >          Issue Type: New Feature
>>>> >          Components: Search
>>>> >            Reporter: Robert Muir
>>>> >            Assignee: Robert Muir
>>>> >            Priority: Minor
>>>> >             Fix For: 3.1
>>>> >
>>>> >         Attachments: automaton.patch, automatonMultiQuery.patch,
>>>> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
>>>> automatonWithWildCard.patch, automatonWithWildCard2.patch,
>>>> BenchWildcard.java, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>>>> LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>>>> LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>>> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>>>> LUCENE-1606_nodep.patch
>>>> >
>>>> >
>>>> > Attached is a patch for an AutomatonQuery/Filter (name can change if
>>>> its not suitable).
>>>> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
>>>> large indexes (100M+ unique tokens) where queries are quite slow, 2 
>>>> minutes,
>>>> etc. Additionally all of the existing RegexQuery implementations in Lucene
>>>> are really slow if there is no constant prefix. This implementation does 
>>>> not
>>>> depend upon constant prefix, and runs the same query in 640ms.
>>>> > Some use cases I envision:
>>>> >  1. lexicography/etc on large text corpora
>>>> >  2. looking for things such as urls where the prefix is not constant
>>>> (http:// or ftp://)
>>>> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
>>>> convert regular expressions into a DFA. Then, the filter "enumerates" terms
>>>> in a special way, by using the underlying state machine. Here is my short
>>>> description from the comments:
>>>> >      The algorithm here is pretty basic. Enumerate terms but instead
>>>> of a binary accept/reject do:
>>>> >
>>>> >      1. Look at the portion that is OK (did not enter a reject state
>>>> in the DFA)
>>>> >      2. Generate the next possible String and seek to that.
>>>> > the Query simply wraps the filter with ConstantScoreQuery.
>>>> > I did not include the automaton.jar inside the patch but it can be
>>>> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>
>


-- 
Robert Muir
rcm...@gmail.com

Reply via email to