Re: [jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

Simon Willnauer Sat, 05 Dec 2009 14:05:23 -0800

On Sat, Dec 5, 2009 at 10:58 PM, Ghazal Gharooni
<[email protected]> wrote:
> Hello,
>
> I am new in the community and I've completely been confused. Please anybody
> help me out to know which part of codes you are working with. How should I
> participate in work? Thank you!


Hi Ghazal,
what exact information do you need? Are you asking for info on this
particular issue?

simon
>
>
>
>
> On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <[email protected]>
> wrote:
>>
>>     [
>> https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> ]
>>
>> Uwe Schindler updated LUCENE-1606:
>> ----------------------------------
>>
>>    Attachment:     (was: LUCENE-1606-flex.patch)
>>
>> > Automaton Query/Filter (scalable regex)
>> > ---------------------------------------
>> >
>> >                 Key: LUCENE-1606
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1606
>> >             Project: Lucene - Java
>> >          Issue Type: New Feature
>> >          Components: Search
>> >            Reporter: Robert Muir
>> >            Assignee: Robert Muir
>> >            Priority: Minor
>> >             Fix For: 3.1
>> >
>> >         Attachments: automaton.patch, automatonMultiQuery.patch,
>> > automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
>> > automatonWithWildCard.patch, automatonWithWildCard2.patch,
>> > BenchWildcard.java, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>> > LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch,
>> > LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>> > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>> > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>> > LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch,
>> > LUCENE-1606_nodep.patch
>> >
>> >
>> > Attached is a patch for an AutomatonQuery/Filter (name can change if its
>> > not suitable).
>> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
>> > large indexes (100M+ unique tokens) where queries are quite slow, 2 
>> > minutes,
>> > etc. Additionally all of the existing RegexQuery implementations in Lucene
>> > are really slow if there is no constant prefix. This implementation does 
>> > not
>> > depend upon constant prefix, and runs the same query in 640ms.
>> > Some use cases I envision:
>> >  1. lexicography/etc on large text corpora
>> >  2. looking for things such as urls where the prefix is not constant
>> > (http:// or ftp://)
>> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
>> > convert regular expressions into a DFA. Then, the filter "enumerates" terms
>> > in a special way, by using the underlying state machine. Here is my short
>> > description from the comments:
>> >      The algorithm here is pretty basic. Enumerate terms but instead of
>> > a binary accept/reject do:
>> >
>> >      1. Look at the portion that is OK (did not enter a reject state in
>> > the DFA)
>> >      2. Generate the next possible String and seek to that.
>> > the Query simply wraps the filter with ConstantScoreQuery.
>> > I did not include the automaton.jar inside the patch but it can be
>> > downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Updated: (LUCENE-1606) Automaton Query/Filter (scalable regex)

Reply via email to