Re: [RegexQuery] how to check what words were founded in particulary Documents ?

mhzmark Sat, 21 Jul 2007 11:16:59 -0700

You are right. After I changed "Cat Cot Cit Cet" to "cat cot cit cet", it works 
very good :)I see that there is quite a lot of knowledge to acquire in order to 
be familiar with this technology... Thank You !


> StandardAnalyzer lowercases terms. The term you pass to the term enum 
> has an upper case character. You really need to be careful if you are 
> going to use an analyzer to store and then a term enum to cycle 
> through...the term you pass the term enum must match the output of the 
> analyzer.
> 
> - Mark
> 
> [EMAIL PROTECTED] wrote:
> > To begin with, thanks for Yours quick response.
> >
> > I am still trying write this code and I've already written this one:
> >
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.store.*;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.document.*;
> > import org.apache.lucene.search.*;
> > import org.apache.lucene.search.regex.*;
> >
> > public class TmpMain {
> >     public static void main(String[] args) throws Exception {       
> >             RAMDirectory directory = new RAMDirectory(); 
> >             IndexWriter writer = new IndexWriter(directory, new
> StandardAnalyzer(), 
> >                                                       true); 
> >             Document doc = new Document(); 
> >             doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES, 
> >                                     Field.Index.TOKENIZED)); 
> >             writer.addDocument(doc); 
> >             writer.optimize();
> >             writer.close(); 
> >             IndexReader reader=IndexReader.open(directory);
> >             RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new
> Term("field", 
> >                             "C.t"), new JavaUtilRegexCapabilities());
> >     /*
> >     //Second Version:
> >             WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
> >                                           Term("field", "C?t"));
> >     */
> >             while(regexTermEnum.next()) {
> >                     System.out.println("Next:");
> >                     System.out.println(regexTermEnum.term().text());
> >             }
> >             System.out.println("End.");             
> >     }
> > }
> >
> >
> > But the output of this code is:
> > End.
> >
> > As you can see above I have tried change line:
> >     RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field",
> 
> >             "C.t"), new JavaUtilRegexCapabilities());
> > on
> >     WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new
> Term("field", 
> >             "C?t"));
> >
> > but the result is the same :/
> > What am I doing wrong ?
> >
> >
> >
> >
> >   
> >> Erick - you're not missing anything, except that the original poster  
> >> is after RegexQuery, not WildcardQuery.  Both work basically the same 
> 
> >> way, except in the pattern matching capabilities.
> >>
> >>    Erik
> >>
> >> On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
> >>     
> >>> Erik:
> >>>
> >>> Well, you wrote the book <G>. But I thought something like this
> >>> would work
> >>>
> >>> TermDocs td = reader.termDocs();
> >>> WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
> >>> "c*t"));
> >>> while (we.next()) {
> >>>  td.seek(we);
> >>>  while (td.next()) {
> >>>     report document contains term;
> >>>  }
> >>> }
> >>>
> >>> Although I admit I haven't tried it, so I could be totally off  
> >>> base. What
> >>> am I missing?
> >>>
> >>> Erick
> >>>
> >>> On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> >>>       
> >>>> Erick - I think you're mixing things up with WildcardQuery.
> >>>> RegexQuery does support all regex capabilities (depending on the
> >>>> underlying regex matcher used).
> >>>>
> >>>> A couple of techniques you could use to achieve the goal:
> >>>>
> >>>>         * Use RegexTermEnum, though that'll give you the terms  
> >>>> across the
> >>>> entire index, so maybe in your use case you could index a single
> >>>> document into a RAMDirectory and RegexTermEnum on it.
> >>>>
> >>>>         * Try out SpanRegexQuery and use getSpans() to get the exact
> >>>> matches.
> >>>>
> >>>> Erik
> >>>>
> >>>>
> >>>>
> >>>> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
> >>>>
> >>>>         
> >>>>> First, the period (.) isn't part of the syntax, so make sure you  
> >>>>>           
> >>>> look
> >>>>         
> >>>>> more carefully at the Lucene syntax...
> >>>>>
> >>>>> Then, you might be able to use WildcardTermEnum to find
> >>>>> the terms that match and TermDocs to find the documents
> >>>>> that contain those terms.
> >>>>>
> >>>>> There's nothing built into Lucene to do this out of the box, you
> >>>>> have to roll your own.
> >>>>>
> >>>>> Best
> >>>>> Erick
> >>>>>
> >>>>> On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED]  
> >>>>>           
> >>>> <[EMAIL PROTECTED]>
> >>>>         
> >>>>> wrote:
> >>>>>           
> >>>>>> Hello.
> >>>>>>
> >>>>>> Let assume that I have this code in my application:
> >>>>>>
> >>>>>>    (...)
> >>>>>>    Query query = new RegexQuery(new Term("field", "C.T"));;
> >>>>>>    // searching...
> >>>>>>    (...)
> >>>>>>
> >>>>>> And now, I would like to know if my application founded "cat" or
> >>>>>> "cot" or
> >>>>>> something else. How can I check what was founded by my  
> >>>>>>             
> >>>> application ?
> >>>>         
> >>>>>> I would like to write application like this:
> >>>>>>    INPUT -> regular expression
> >>>>>>    OUTPUT -> file  ---> word
> >>>>>>
> >>>>>> example: INPUT = "C.T"
> >>>>>>          OUTPUT =
> >>>>>>                   a.txt --> CAT
> >>>>>>                   a.txt --> COT
> >>>>>>                   b.txt --> CAT
> >>>>>>                   b.txt --> CAT
> >>>>>>                   b.txt --> COT
> >>>>>>                   (...)
> >>>>>>
> >>>>>> So, how to check what words were founded in particulary Documents
> >>>>>> after
> >>>>>> searching? I see that Hits class contains only founded  
> >>>>>>             
> >>>> documents and
> >>>>         
> >>>>>> nothing more (I am new in this technology so I can be wrong...)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>>         
> >>>>>> -
> >>>>>> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
> >>>>>>             
> >>>> latwiej je
> >>>>         
> >>>>>> oczarujesz
> >>>>>>
> >>>>>>             
> >>>>>>>>> http://link.interia.pl/f1b17
> >>>>>>>>>                   
> >>>>>>  
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>>         
> >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
> >>>>>>
> >>>>>>
> >>>>>>             
> >>>>
> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>> For additional commands, e-mail: [EMAIL PROTECTED]
> >>>>
> >>>>
> >>>>         
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >>
> >>     
> >
> >
> > ----------------------------------------------------------------------
> > Najbogatsza Polka 
> > Zobacz >>> http://link.interia.pl/f1ae2
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


----------------------------------------------------------------------
Wszystko czego potrzebujesz latem: kremy do opalania,
stroje kapielowe, maly romans

>>>http://link.interia.pl/f1b15


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Reply via email to