Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Mark Miller Sat, 21 Jul 2007 11:01:42 -0700

StandardAnalyzer lowercases terms. The term you pass to the term enumhas an upper case character. You really need to be careful if you aregoing to use an analyzer to store and then a term enum to cyclethrough...the term you pass the term enum must match the output of theanalyzer.


- Mark


[EMAIL PROTECTED] wrote:

To begin with, thanks for Yours quick response.

I am still trying write this code and I've already written this one:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.regex.*;

public class TmpMain {
        public static void main(String[] args) throws Exception {

RAMDirectory directory = new RAMDirectory();IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),true);Document doc = new Document();doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES,Field.Index.TOKENIZED));writer.addDocument(doc);writer.optimize();writer.close();IndexReader reader=IndexReader.open(directory);RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field","C.t"), new JavaUtilRegexCapabilities());

        /*
        //Second Version:

WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, newTerm("field", "C?t"));

        */
                while(regexTermEnum.next()) {
                        System.out.println("Next:");
                        System.out.println(regexTermEnum.term().text());
                }
                System.out.println("End.");           
        }
}


But the output of this code is:
End.

As you can see above I have tried change line:

RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field","C.t"), new JavaUtilRegexCapabilities());

on

WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new Term("field","C?t"));


but the result is the same :/
What am I doing wrong ?

Erick - you're not missing anything, except that the original posteris after RegexQuery, not WildcardQuery. Both work basically the sameway, except in the pattern matching capabilities.


        Erik

On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:

Erik:

Well, you wrote the book <G>. But I thought something like this
would work

TermDocs td = reader.termDocs();
WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
"c*t"));
while (we.next()) {
 td.seek(we);
 while (td.next()) {
    report document contains term;
 }
}

Although I admit I haven't tried it, so I could be totally offbase. What

am I missing?

Erick

On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

Erick - I think you're mixing things up with WildcardQuery.
RegexQuery does support all regex capabilities (depending on the
underlying regex matcher used).

A couple of techniques you could use to achieve the goal:

* Use RegexTermEnum, though that'll give you the termsacross the

entire index, so maybe in your use case you could index a single
document into a RAMDirectory and RegexTermEnum on it.

        * Try out SpanRegexQuery and use getSpans() to get the exact
matches.

Erik



On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:

First, the period (.) isn't part of the syntax, so make sure you

look

more carefully at the Lucene syntax...

Then, you might be able to use WildcardTermEnum to find
the terms that match and TermDocs to find the documents
that contain those terms.

There's nothing built into Lucene to do this out of the box, you
have to roll your own.

Best
Erick

On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED]

<[EMAIL PROTECTED]>

wrote:

Hello.

Let assume that I have this code in my application:

   (...)
   Query query = new RegexQuery(new Term("field", "C.T"));;
   // searching...
   (...)

And now, I would like to know if my application founded "cat" or
"cot" or

something else. How can I check what was founded by my

application ?

I would like to write application like this:
   INPUT -> regular expression
   OUTPUT -> file  ---> word

example: INPUT = "C.T"
         OUTPUT =
                  a.txt --> CAT
                  a.txt --> COT
                  b.txt --> CAT
                  b.txt --> CAT
                  b.txt --> COT
                  (...)

So, how to check what words were founded in particulary Documents
after

searching? I see that Hits class contains only founded

documents and

nothing more (I am new in this technology so I can be wrong...)

---------------------------------------------------------------------

-
Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,

latwiej je

oczarujesz

http://link.interia.pl/f1b17

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



----------------------------------------------------------------------

Najbogatsza PolkaZobacz >>> http://link.interia.pl/f1ae2



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Reply via email to