Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Erik Hatcher Fri, 20 Jul 2007 18:01:31 -0700

Erick - you're not missing anything, except that the original posteris after RegexQuery, not WildcardQuery. Both work basically the sameway, except in the pattern matching capabilities.


        Erik


On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:

Erik:

Well, you wrote the book <G>. But I thought something like this
would work

TermDocs td = reader.termDocs();
WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
"c*t"));
while (we.next()) {
 td.seek(we);
 while (td.next()) {
    report document contains term;
 }
}

Although I admit I haven't tried it, so I could be totally offbase. What

am I missing?

Erick

On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:


Erick - I think you're mixing things up with WildcardQuery.
RegexQuery does support all regex capabilities (depending on the
underlying regex matcher used).

A couple of techniques you could use to achieve the goal:

* Use RegexTermEnum, though that'll give you the termsacross the

entire index, so maybe in your use case you could index a single
document into a RAMDirectory and RegexTermEnum on it.

        * Try out SpanRegexQuery and use getSpans() to get the exact
matches.

Erik



On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:

> First, the period (.) isn't part of the syntax, so make sure youlook

> more carefully at the Lucene syntax...
>
> Then, you might be able to use WildcardTermEnum to find
> the terms that match and TermDocs to find the documents
> that contain those terms.
>
> There's nothing built into Lucene to do this out of the box, you
> have to roll your own.
>
> Best
> Erick
>

> On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED]<[EMAIL PROTECTED]>

> wrote:
>>
>> Hello.
>>
>> Let assume that I have this code in my application:
>>
>>    (...)
>>    Query query = new RegexQuery(new Term("field", "C.T"));;
>>    // searching...
>>    (...)
>>
>> And now, I would like to know if my application founded "cat" or
>> "cot" or

>> something else. How can I check what was founded by myapplication ?

>>
>> I would like to write application like this:
>>    INPUT -> regular expression
>>    OUTPUT -> file  ---> word
>>
>> example: INPUT = "C.T"
>>          OUTPUT =
>>                   a.txt --> CAT
>>                   a.txt --> COT
>>                   b.txt --> CAT
>>                   b.txt --> CAT
>>                   b.txt --> COT
>>                   (...)
>>
>> So, how to check what words were founded in particulary Documents
>> after

>> searching? I see that Hits class contains only foundeddocuments and

>> nothing more (I am new in this technology so I can be wrong...)
>>
>>
>>
>>
>>
>>
>>

>>---------------------------------------------------------------------

>> -

>> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,latwiej je

>> oczarujesz
>>
>> >>>http://link.interia.pl/f1b17
>>
>>

>>---------------------------------------------------------------------

>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Reply via email to