Re: AW: AW: AW: "fuzzy prefix" search

Otis Gospodnetic Tue, 03 May 2011 14:12:16 -0700

Clemens - that's just an example.  Stick another tokenizer in there, like 
WhitespaceTokenizer in there, for example.


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Clemens Wyss <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Tue, May 3, 2011 4:31:14 PM
> Subject: AW: AW: AW: "fuzzy prefix" search
> 
> But doesn't the KeyWordTokenizer extract single words out oft he stream? I 
>would  like to create n-grams on the stream (field content) as it is...
> 
> >  -----Ursprüngliche Nachricht-----
> > Von: Otis Gospodnetic [mailto:[email protected]]
> >  Gesendet: Dienstag, 3. Mai 2011 21:31
> > An: [email protected]
> >  Betreff: Re: AW: AW: "fuzzy prefix" search
> > 
> > Clemens,
> > 
> > Something a la:
> > 
> > public TokenStream tokenStream (String  fieldName, Reader r) {
> >   return nw EdgeNGramTokenFilter(new  KeywordTokenizer(r),
> > EdgeNGramTokenFilter.Side.FRONT, 1, 4); }
> > 
> > 
> > Check out page 265 of Lucene in Action 2.
> > 
> >  Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene  ecosystem search :: http://search-lucene.com/
> > 
> > 
> > 
> > ----- Original  Message ----
> > > From: Clemens Wyss <[email protected]>
> > > To:  "[email protected]"  <[email protected]>
> >  > Sent: Tue, May 3, 2011 12:57:39 PM
> > > Subject: AW: AW: "fuzzy  prefix" search
> > >
> > > How does an simple Analyzer look that  just "n-grams" the  docs/fields.
> > >
> > > class  SimpleNGramAnalyzer extends  Analyzer
> > > {
> > >  @Override
> > > public TokenStream tokenStream ( String fieldName,   Reader reader )
> > > {
> > >     EdgeNGramTokenFilter...  ???
> > > }
> > > }
> >  >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von:  Otis  Gospodnetic [mailto:[email protected]]
> >  > >  Gesendet: Dienstag, 3. Mai 2011 13:36
> > > > An: [email protected]
> >  > >  Betreff: Re: AW: "fuzzy prefix" search
> > >  >
> > > > Hi,
> > > >
> > > > I  didn't  read this thread closely, but just in case:
> > > > * Is this  something  you can handle with synonyms?
> > > > * If this is for  English and you are  trying to handle typos, there is 
> > > > a 
>list
> >  >of
> > > > common English misspellings  out there that you  could use for this
> > perhaps.
> > > > * Have you  considered  n-gramming your tokens?  Not sure if this would
> >  help,
> > > > didn't read  messages/examples closely enough, but  you may want to
> > look at
> > > > this if  you haven't done  so yet.
> > > >
> > > > Otis
> > > > ----
> >  > > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch Lucene
> >  ecosystem
> > > > search :: http://search-lucene.com/
> > > >
> > >  >
> > > >
> > > > ----- Original  Message  ----
> > > > > From: Clemens Wyss <[email protected]>
> > >  > > To:  "[email protected]"   <[email protected]>
> >  > >  > Sent: Tue, May 3, 2011 5:25:30 AM
> > > > >  Subject: AW: "fuzzy prefix"  search
> > > > >
> > >  > > >PrefixQuery
> > > > > I'd like the  combination  of prefix and fuzzy ;-) because  people 
>could
> > > >  >also  type "menlo" or "märl" and in any of these cases I'd like  to  
>get
> > > >  >a hit on Merlot (for suggesting  Merlot)
> > > > >
> > > > > >   -----Ursprüngliche  Nachricht-----
> > > > > > Von: Ian  Lea  [mailto:[email protected]]
> > > > >  >  Gesendet:  Dienstag, 3. Mai 2011 11:22
> > > > >  > An: [email protected]
> >  > >  > >  Betreff: Re: "fuzzy prefix" search
> > >  > > >
> > > >  > > I'd assumed that  FuzzyQuery  wouldn't ignore case but I could be
> > wrong.
> > >  > > >  What would be the edit  distance between  "mer"  and "merlot"?
> > Would
> > > > > > it be less that 1.5  which I   reckon would be the value of
> > > > > >  length(term)*0.5 as detailed in  the  javadocs?  Seems unlikely,  
>but
> > > > > > I don't really  know anything about   the Levenshtein (edit 
distance)
> > > > algorithm as  used by  FuzzyQuery.
> > > > > >  Wouldn't a PrefixQuery be  more  appropriate here?
> > > > > >
> > > > >  >
> > > > > >   --
> > > > > >  Ian.
> > > > > >
> > > > > > On Tue, May  3,  2011 at 10:10 AM, Clemens Wyss
> > > > > > <[email protected]>
> > >  > >  >  wrote:
> > > > > > > Unfortunately  lowercasing doesn't  help.
> > > > > > > Also,   doesn't the FuzzyQuery ignore  casing?
> > > > > >  >
> > > > > > >>   -----Ursprüngliche  Nachricht-----
> > > > > > >> Von: Ian Lea   [mailto:[email protected]]
> > > > >  >  >>  Gesendet: Dienstag, 3. Mai 2011 11:06
> > >  > > > >>  An: [email protected]
> >  > >  > >  >> Betreff: Re: "fuzzy prefix"  search
> > > > > >  >>
> > > > > >  >>  Mer != mer.  The latter will be  what is indexed  because
> > > > > > >> StandardAnalyzer calls   LowerCaseFilter.
> > > > > > >>
> > > > >  > >>   --
> > > > > > >> Ian.
> > >  > > > >>
> > > > > >  >>
> > >  > > > >> On  Tue, May 3, 2011 at 9:56 AM,  Clemens  Wyss
> > > > > > <[email protected]>
> > >  > >  > >>  wrote:
> > > > > > >>  > Sorry for coming back  to my issue. Can anybody  explain why  
>my
> > > > > > "simple"
> > > >  > >  >> unit test below fails? Any  hint/help  appreciated.
> >  > > > > >> >
> > > > > > >>  >  Directory  directory = new RAMDirectory(); IndexWriter
> >  > > > >  >> > indexWriter =  new IndexWriter(  directory, new
> > > > >  > >> >  StandardAnalyzer(
> > > > > >   Version.LUCENE_31
> >  > > > > >> > ),   IndexWriter.MaxFieldLength.UNLIMITED  ); Document
> > document
> >  > >  =
> > > > > > new
> > > > > >  >> > Document();   document.add( new Field( "test",  "Merlot",
> > > > > > >> >  Field.Store.YES,  Field.Index.ANALYZED ) );
> > > > > > >> >   indexWriter.addDocument(
> > > > > >  >> >  document );  IndexReader indexReader =
> > > > > >  indexWriter.getReader();
> > > >  > > >> >  IndexSearcher searcher = new  IndexSearcher(  indexReader );
> >  > > > > >> > Query q = new FuzzyQuery(   new Term(  "test", "Mer" ), 0.5f, 
>0,
> > > > > > >> > 10 ); //  or  Query q =  new FuzzyQuery( new Term( "test", 
"Mer"
> > >  > > >  >> > ), 0.5f); TopDocs  result =  searcher.search( q, 10 );
> > > >  > > >> >  Assert.assertEquals( 1,  result.totalHits  );
> > > > >  > >> >
> > > > > > >> > -    Clemens
> > > > > > >> >
> > > > > >  >> >>  -----Ursprüngliche  Nachricht-----
> > >  > > > >> >> Von:  Clemens Wyss [mailto:[email protected]]
> > > >  > >  >>  >> Gesendet: Montag, 2. Mai 2011  23:01
> > > > > >  >> >> An: [email protected]
> >  > >  > >  >> >> Betreff: AW: "fuzzy prefix"  search
> > > >  > > >>  >>
> > >  > > > >> >> Is it the  combination of FuzzyQuery and  Term  which makes 
>the
> > > > > >  >> >>  search to go for "word  boundaries"?
> > > > > >   >> >>
> > > > > > >> >> >    -----Ursprüngliche Nachricht-----
> > > > > > >> >>  > Von:  Clemens  Wyss [mailto:[email protected]]
> > > >  > >  >>  >> > Gesendet: Montag, 2. Mai 2011  14:13
> > > > >  > >> >> >  An: [email protected]
> >  > >  > >  >> >> > Betreff: AW: "fuzzy  prefix"  search
> > > > > > >>  >>  >
> > > > > > >>  >> > I tried this too,  but unfortunately  I only get hits  when
> > > > > >  >> >> > the search term is a least   as long as the word to  be 
>looked
> > up.
> > > > > > >> >>   >
> > > > > >  >> >> > E.g.:
> > >  > > >  >> >> > ...
> > > > > >  >>  >> >  Directory directory = new RAMDirectory();  IndexWriter
> > > > > >   >> >> >  indexWriter = new IndexWriter( directory,  >>  >>  >
> > > > > > IndexManager.getIndexingAnalyzer(
> > >  > >  > >>  >> LOCALE_DE ),
> > > > >  > >> >>  >                IndexWriter.MaxFieldLength.UNLIMITED );
> > > > > >  >> >>  >
> > > > > >  >> >>  > Document document = new  Document(); document.add(
> > new
> >  > > > > Field(
> > > > > >  >> >>  > "test", "Merlot",
> > > > > > >>   >>  >             Field.Store.YES,   Field.Index.ANALYZED ) );
> > > > > >  >>  >>  indexWriter.addDocument(
> > > > > > >>  >> >  document  );
> > > > > > >>  >> >
> > > > > > >> >>  >   IndexReader indexReader = indexWriter.getReader();
> > > > >  >  >> >> > IndexSearcher
> > > > >  >  >> >>  > searcher = new IndexSearcher(  indexReader );  >> >>  >
> > > > > >  >> >> > Query q = new FuzzyQuery(   new Term( "test", "Mer"  ), 
>0.6f,
> > > > > > >> >> > 1 );   TopDocs  result = searcher.search( q, 10 );
> > > > > >  >>  >> > Assert.assertEquals(
> > > > > >  >>  >>  > 1,
> > > > > > >>  >> result.totalHits ); ...
> > > > >  >   >> >> >
> > > > > > >> >> >  >  -----Ursprüngliche  Nachricht-----
> > > > > >  >> >> >  > Von: Uwe Schindler [mailto:[email protected]]
> > > > > >  >>  >>  > > Gesendet: Montag, 2. Mai 2011  13:50
> > > > > >  >> >> >  > An: [email protected]
> >  > >  > >  >> >> > > Betreff: RE: "fuzzy  prefix"  search
> > > > > > >>  >> >  >
> > > > > >  >> >> > > Hi,
> >  > > > > >>  >> >  >
> > > >  > > >> >> > > You can pass an integer   to  FuzzyQuery which defines 
the
> > > > > > >> >> >  >  number of  characters that are seen as prefix. So all
> >  > > > >  >> >> > > terms must match
> >  > > > > >>   >> > > this prefix and the rest  of each term is matched using
> > >fuzzy.
> > > > > >  >> >> > >
> > > > > > >>  >>  > >  Uwe
> > > > > > >> >> >   >
> > > > > > >> >> > >  -----
> >  > > > >  >> >> > > Uwe Schindler
> > >  > > > >>   >> > > H.-H.-Meier-Allee 63, D-28213  Bremen
> > > > > >  >> http://www.thetaphi.de
> >  > > > >  >> >> > >  eMail: [email protected]
> > > > > >  >>  >> >  >
> > > > > > >>  >> > > >  -----Original Message-----
> > > > >  >  >> >> > >  > From: Clemens Wyss [mailto:[email protected]]
> > > >  > >  >>  >> > > > Sent: Monday, May 02,  2011 1:47 PM   >> > > > To:
> > > > > >  >> [email protected]
> >  > >  > >  >> >> > > > Subject:  "fuzzy prefix"  search  >> >> > > >
> > >  > > > >>  >> > > > I'd  like to search  fuzzily but not on a full  term.
> > > > > > >>  >> >  > > E.g.
> > > > >  > >>  >> > > > I have a text "Merlot  del  Ticino"
> >  > > > > >> >> > > > I'd like
> > > >  >  > >>  >> > > > "mer", "merr", "melo",  ... to  match.
> > > > > > >>  >> >  > >
> > > > >  > >> >> > > > If  I use  FuzzyQuery only  "merlot,  "merlott" hit. 
>What
> >  > > > > >> >>  >  > >  Query-combination should I use?
> > > > > > >>   >> > >  >
> > > > > > >> >> >  > >  Thx
> > > > > > >> >> >   > > Clemens
> > > >  > > >> >> > >  >
> > > > > > >>   >> > > >
> >  > > > > >> >> > >  >
> > > >  > > >> >> > > >   
>--------------------------------------------------------
> > > > >  >  >> >> > > > ----
> > > > > >  >>  >>  > > > ---
> > > > > >  >> >> > > >  ---
> > > > > >   >> >> > > > --
> > > > >  > >>  >> > > > -  To unsubscribe, e-mail:
> > > >   > > >> >> > > > [email protected]
> >  > >  > >  >> >> > > > For additional  commands,  e-mail:
> > > > > >  >> >> >  > > [email protected]    >> >> > >
> > > > > > >> >>  >  >
> > > > > >  >> >> >  >
> > > > > >  >> >> > >
> > >  > > > >> >> > >   
>----------------------------------------------------------
> > > >  > >  >> >> > > ----
> > > > > >  >>  >> >  > ---
> > > > > > >>  >> > > ---
> > > > > >  >>  >>  > > - To unsubscribe, e-mail:
> > > > > >  >>  >> > > [email protected]
> >  > >  > >  >> >> > > For additional  commands,  e-mail:
> > > > > > >>  >> >  > [email protected]
> >  > >  > >  >> >> >
> > > > >  > >> >>  >
> > > > > > >>  >>  >
> > > > > > >>  >>  
>--------------------------------------------------------------
> > >  >  > > >> >> --
> > > > > >   >> >> >  ---
> > > > > > >> >>  > -- To unsubscribe,   e-mail:
> > > > > > >>  >> > [email protected]
> >  > >  > >  >> >> > For additional commands,  e-mail:
> > > >  > > >>  >> > [email protected]
> >  > >  > >  >> >>
> > > > > >  >> >>
> > > >  > > >> >>
> >  > > > > >> >>   
>--------------------------------------------------------------
> > > >  > >  >> >> ----
> > > > > >   >> >> --- To  unsubscribe, e-mail:
> > > > > >  >> >> [email protected]
> >  > >  > >  >> >> For additional commands,  e-mail:
> > > > >  > [email protected]    >> >
> > > > > > >> >
> > > > >  > >>  >
> > > > > > >> >   
>---------------------------------------------------------------
> > >  > >  > >> > ----
> > > > > >   >> > -- To unsubscribe,  e-mail:
> > > > > > [email protected]
> >  > >  > >  >> > For additional commands,  e-mail:
> > > > >  > [email protected]    >> >
> > > > > > >> >
> > > > >  >  >>
> > > > > > >>
> > > > >  > >>   
>-----------------------------------------------------------------
> > >  > >  > >> ----
> > > > > >  >> To  unsubscribe, e-mail: java-user-
> > [email protected]
> >  > >  > >  >> For additional commands,  e-mail:
> > > > > > [email protected]    >
> > > > > > >
> > > > > > >
> >  > > > > >   
>------------------------------------------------------------------
> > >  > >  > > ---
> > > > > >  > To  unsubscribe, e-mail: [email protected]
> >  > >  > >  > For additional commands, e-mail:  java-user-
> > [email protected]
> > >  >  > > >
> > > > > > >
> > > >  > >
> > > > >  >
> > > > > >   
>--------------------------------------------------------------------
> >  > >  > > - To  unsubscribe, e-mail: [email protected]
> >  > >  > >  For additional commands, e-mail: [email protected]
> >  > >  >
> > > > >
> > > > >   
---------------------------------------------------------------------
> >  > >  > To  unsubscribe, e-mail: [email protected]
> >  > >  > For  additional commands, e-mail: [email protected]
> >  > >  >
> > > > >
> > > >
> > >  >   ---------------------------------------------------------------------
> >  > > To  unsubscribe, e-mail: [email protected]
> >  > >  For additional commands, e-mail: [email protected]
> >  >
> > >
> > >  ---------------------------------------------------------------------
> >  > To  unsubscribe, e-mail: [email protected]
> >  > For  additional commands, e-mail: [email protected]
> >  >
> > >
> > 
> >  ---------------------------------------------------------------------
> > To  unsubscribe, e-mail: [email protected]
> >  For additional commands, e-mail: [email protected]
> 
> 
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail: [email protected]
> For  additional commands, e-mail: [email protected]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: AW: AW: AW: "fuzzy prefix" search

Reply via email to