Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Erick Erickson Fri, 20 Jul 2007 14:45:57 -0700

Erik:

Well, you wrote the book <G>. But I thought something like this
would work


TermDocs td = reader.termDocs();
WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
"c*t"));
while (we.next()) {
 td.seek(we);
 while (td.next()) {
    report document contains term;
 }
}

Although I admit I haven't tried it, so I could be totally off base. What
am I missing?

Erick

On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:


Erick - I think you're mixing things up with WildcardQuery.
RegexQuery does support all regex capabilities (depending on the
underlying regex matcher used).

A couple of techniques you could use to achieve the goal:

        * Use RegexTermEnum, though that'll give you the terms across the
entire index, so maybe in your use case you could index a single
document into a RAMDirectory and RegexTermEnum on it.

        * Try out SpanRegexQuery and use getSpans() to get the exact
matches.

Erik



On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:

> First, the period (.) isn't part of the syntax, so make sure you look
> more carefully at the Lucene syntax...
>
> Then, you might be able to use WildcardTermEnum to find
> the terms that match and TermDocs to find the documents
> that contain those terms.
>
> There's nothing built into Lucene to do this out of the box, you
> have to roll your own.
>
> Best
> Erick
>
> On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> wrote:
>>
>> Hello.
>>
>> Let assume that I have this code in my application:
>>
>>    (...)
>>    Query query = new RegexQuery(new Term("field", "C.T"));;
>>    // searching...
>>    (...)
>>
>> And now, I would like to know if my application founded "cat" or
>> "cot" or
>> something else. How can I check what was founded by my application ?
>>
>> I would like to write application like this:
>>    INPUT -> regular expression
>>    OUTPUT -> file  ---> word
>>
>> example: INPUT = "C.T"
>>          OUTPUT =
>>                   a.txt --> CAT
>>                   a.txt --> COT
>>                   b.txt --> CAT
>>                   b.txt --> CAT
>>                   b.txt --> COT
>>                   (...)
>>
>> So, how to check what words were founded in particulary Documents
>> after
>> searching? I see that Hits class contains only founded documents and
>> nothing more (I am new in this technology so I can be wrong...)
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> -
>> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz, latwiej je
>> oczarujesz
>>
>> >>>http://link.interia.pl/f1b17
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Reply via email to