Clemens - that's just an example. Stick another tokenizer in there, like WhitespaceTokenizer in there, for example.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Clemens Wyss <clemens...@mysign.ch> > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > Sent: Tue, May 3, 2011 4:31:14 PM > Subject: AW: AW: AW: "fuzzy prefix" search > > But doesn't the KeyWordTokenizer extract single words out oft he stream? I >would like to create n-grams on the stream (field content) as it is... > > > -----Ursprüngliche Nachricht----- > > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > Gesendet: Dienstag, 3. Mai 2011 21:31 > > An: java-user@lucene.apache.org > > Betreff: Re: AW: AW: "fuzzy prefix" search > > > > Clemens, > > > > Something a la: > > > > public TokenStream tokenStream (String fieldName, Reader r) { > > return nw EdgeNGramTokenFilter(new KeywordTokenizer(r), > > EdgeNGramTokenFilter.Side.FRONT, 1, 4); } > > > > > > Check out page 265 of Lucene in Action 2. > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > ----- Original Message ---- > > > From: Clemens Wyss <clemens...@mysign.ch> > > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > > > Sent: Tue, May 3, 2011 12:57:39 PM > > > Subject: AW: AW: "fuzzy prefix" search > > > > > > How does an simple Analyzer look that just "n-grams" the docs/fields. > > > > > > class SimpleNGramAnalyzer extends Analyzer > > > { > > > @Override > > > public TokenStream tokenStream ( String fieldName, Reader reader ) > > > { > > > EdgeNGramTokenFilter... ??? > > > } > > > } > > > > > > > -----Ursprüngliche Nachricht----- > > > > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > > > Gesendet: Dienstag, 3. Mai 2011 13:36 > > > > An: java-user@lucene.apache.org > > > > Betreff: Re: AW: "fuzzy prefix" search > > > > > > > > Hi, > > > > > > > > I didn't read this thread closely, but just in case: > > > > * Is this something you can handle with synonyms? > > > > * If this is for English and you are trying to handle typos, there is > > > > a >list > > >of > > > > common English misspellings out there that you could use for this > > perhaps. > > > > * Have you considered n-gramming your tokens? Not sure if this would > > help, > > > > didn't read messages/examples closely enough, but you may want to > > look at > > > > this if you haven't done so yet. > > > > > > > > Otis > > > > ---- > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene > > ecosystem > > > > search :: http://search-lucene.com/ > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: Clemens Wyss <clemens...@mysign.ch> > > > > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > > > > > Sent: Tue, May 3, 2011 5:25:30 AM > > > > > Subject: AW: "fuzzy prefix" search > > > > > > > > > > >PrefixQuery > > > > > I'd like the combination of prefix and fuzzy ;-) because people >could > > > > >also type "menlo" or "märl" and in any of these cases I'd like to >get > > > > >a hit on Merlot (for suggesting Merlot) > > > > > > > > > > > -----Ursprüngliche Nachricht----- > > > > > > Von: Ian Lea [mailto:ian....@gmail.com] > > > > > > Gesendet: Dienstag, 3. Mai 2011 11:22 > > > > > > An: java-user@lucene.apache.org > > > > > > Betreff: Re: "fuzzy prefix" search > > > > > > > > > > > > I'd assumed that FuzzyQuery wouldn't ignore case but I could be > > wrong. > > > > > > What would be the edit distance between "mer" and "merlot"? > > Would > > > > > > it be less that 1.5 which I reckon would be the value of > > > > > > length(term)*0.5 as detailed in the javadocs? Seems unlikely, >but > > > > > > I don't really know anything about the Levenshtein (edit distance) > > > > algorithm as used by FuzzyQuery. > > > > > > Wouldn't a PrefixQuery be more appropriate here? > > > > > > > > > > > > > > > > > > -- > > > > > > Ian. > > > > > > > > > > > > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss > > > > > > <clemens...@mysign.ch> > > > > > > wrote: > > > > > > > Unfortunately lowercasing doesn't help. > > > > > > > Also, doesn't the FuzzyQuery ignore casing? > > > > > > > > > > > > > >> -----Ursprüngliche Nachricht----- > > > > > > >> Von: Ian Lea [mailto:ian....@gmail.com] > > > > > > >> Gesendet: Dienstag, 3. Mai 2011 11:06 > > > > > > >> An: java-user@lucene.apache.org > > > > > > >> Betreff: Re: "fuzzy prefix" search > > > > > > >> > > > > > > >> Mer != mer. The latter will be what is indexed because > > > > > > >> StandardAnalyzer calls LowerCaseFilter. > > > > > > >> > > > > > > >> -- > > > > > > >> Ian. > > > > > > >> > > > > > > >> > > > > > > >> On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss > > > > > > <clemens...@mysign.ch> > > > > > > >> wrote: > > > > > > >> > Sorry for coming back to my issue. Can anybody explain why >my > > > > > > "simple" > > > > > > >> unit test below fails? Any hint/help appreciated. > > > > > > >> > > > > > > > >> > Directory directory = new RAMDirectory(); IndexWriter > > > > > > >> > indexWriter = new IndexWriter( directory, new > > > > > > >> > StandardAnalyzer( > > > > > > Version.LUCENE_31 > > > > > > >> > ), IndexWriter.MaxFieldLength.UNLIMITED ); Document > > document > > > > = > > > > > > new > > > > > > >> > Document(); document.add( new Field( "test", "Merlot", > > > > > > >> > Field.Store.YES, Field.Index.ANALYZED ) ); > > > > > > >> > indexWriter.addDocument( > > > > > > >> > document ); IndexReader indexReader = > > > > > > indexWriter.getReader(); > > > > > > >> > IndexSearcher searcher = new IndexSearcher( indexReader ); > > > > > > >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, >0, > > > > > > >> > 10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer" > > > > > > >> > ), 0.5f); TopDocs result = searcher.search( q, 10 ); > > > > > > >> > Assert.assertEquals( 1, result.totalHits ); > > > > > > >> > > > > > > > >> > - Clemens > > > > > > >> > > > > > > > >> >> -----Ursprüngliche Nachricht----- > > > > > > >> >> Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > > > > >> >> Gesendet: Montag, 2. Mai 2011 23:01 > > > > > > >> >> An: java-user@lucene.apache.org > > > > > > >> >> Betreff: AW: "fuzzy prefix" search > > > > > > >> >> > > > > > > >> >> Is it the combination of FuzzyQuery and Term which makes >the > > > > > > >> >> search to go for "word boundaries"? > > > > > > >> >> > > > > > > >> >> > -----Ursprüngliche Nachricht----- > > > > > > >> >> > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > > > > >> >> > Gesendet: Montag, 2. Mai 2011 14:13 > > > > > > >> >> > An: java-user@lucene.apache.org > > > > > > >> >> > Betreff: AW: "fuzzy prefix" search > > > > > > >> >> > > > > > > > >> >> > I tried this too, but unfortunately I only get hits when > > > > > > >> >> > the search term is a least as long as the word to be >looked > > up. > > > > > > >> >> > > > > > > > >> >> > E.g.: > > > > > > >> >> > ... > > > > > > >> >> > Directory directory = new RAMDirectory(); IndexWriter > > > > > > >> >> > indexWriter = new IndexWriter( directory, >> >> > > > > > > > IndexManager.getIndexingAnalyzer( > > > > > > >> >> LOCALE_DE ), > > > > > > >> >> > IndexWriter.MaxFieldLength.UNLIMITED ); > > > > > > >> >> > > > > > > > >> >> > Document document = new Document(); document.add( > > new > > > > > > Field( > > > > > > >> >> > "test", "Merlot", > > > > > > >> >> > Field.Store.YES, Field.Index.ANALYZED ) ); > > > > > > >> >> indexWriter.addDocument( > > > > > > >> >> > document ); > > > > > > >> >> > > > > > > > >> >> > IndexReader indexReader = indexWriter.getReader(); > > > > > > >> >> > IndexSearcher > > > > > > >> >> > searcher = new IndexSearcher( indexReader ); >> >> > > > > > > > >> >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), >0.6f, > > > > > > >> >> > 1 ); TopDocs result = searcher.search( q, 10 ); > > > > > > >> >> > Assert.assertEquals( > > > > > > >> >> > 1, > > > > > > >> >> result.totalHits ); ... > > > > > > >> >> > > > > > > > >> >> > > -----Ursprüngliche Nachricht----- > > > > > > >> >> > > Von: Uwe Schindler [mailto:u...@thetaphi.de] > > > > > > >> >> > > Gesendet: Montag, 2. Mai 2011 13:50 > > > > > > >> >> > > An: java-user@lucene.apache.org > > > > > > >> >> > > Betreff: RE: "fuzzy prefix" search > > > > > > >> >> > > > > > > > > >> >> > > Hi, > > > > > > >> >> > > > > > > > > >> >> > > You can pass an integer to FuzzyQuery which defines the > > > > > > >> >> > > number of characters that are seen as prefix. So all > > > > > > >> >> > > terms must match > > > > > > >> >> > > this prefix and the rest of each term is matched using > > >fuzzy. > > > > > > >> >> > > > > > > > > >> >> > > Uwe > > > > > > >> >> > > > > > > > > >> >> > > ----- > > > > > > >> >> > > Uwe Schindler > > > > > > >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > > > >> http://www.thetaphi.de > > > > > > >> >> > > eMail: u...@thetaphi.de > > > > > > >> >> > > > > > > > > >> >> > > > -----Original Message----- > > > > > > >> >> > > > From: Clemens Wyss [mailto:clemens...@mysign.ch] > > > > > > >> >> > > > Sent: Monday, May 02, 2011 1:47 PM >> > > > To: > > > > > > >> java-user@lucene.apache.org > > > > > > >> >> > > > Subject: "fuzzy prefix" search >> >> > > > > > > > > > >> >> > > > I'd like to search fuzzily but not on a full term. > > > > > > >> >> > > > E.g. > > > > > > >> >> > > > I have a text "Merlot del Ticino" > > > > > > >> >> > > > I'd like > > > > > > >> >> > > > "mer", "merr", "melo", ... to match. > > > > > > >> >> > > > > > > > > > >> >> > > > If I use FuzzyQuery only "merlot, "merlott" hit. >What > > > > > > >> >> > > > Query-combination should I use? > > > > > > >> >> > > > > > > > > > >> >> > > > Thx > > > > > > >> >> > > > Clemens > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > >-------------------------------------------------------- > > > > > > >> >> > > > ---- > > > > > > >> >> > > > --- > > > > > > >> >> > > > --- > > > > > > >> >> > > > -- > > > > > > >> >> > > > - To unsubscribe, e-mail: > > > > > > >> >> > > > java-user-unsubscr...@lucene.apache.org > > > > > > >> >> > > > For additional commands, e-mail: > > > > > > >> >> > > > java-user-h...@lucene.apache.org >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > >---------------------------------------------------------- > > > > > > >> >> > > ---- > > > > > > >> >> > > --- > > > > > > >> >> > > --- > > > > > > >> >> > > - To unsubscribe, e-mail: > > > > > > >> >> > > java-user-unsubscr...@lucene.apache.org > > > > > > >> >> > > For additional commands, e-mail: > > > > > > >> >> > > java-user-h...@lucene.apache.org > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> >-------------------------------------------------------------- > > > > > > >> >> -- > > > > > > >> >> > --- > > > > > > >> >> > -- To unsubscribe, e-mail: > > > > > > >> >> > java-user-unsubscr...@lucene.apache.org > > > > > > >> >> > For additional commands, e-mail: > > > > > > >> >> > java-user-h...@lucene.apache.org > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> >-------------------------------------------------------------- > > > > > > >> >> ---- > > > > > > >> >> --- To unsubscribe, e-mail: > > > > > > >> >> java-user-unsubscr...@lucene.apache.org > > > > > > >> >> For additional commands, e-mail: > > > > > > java-user-h...@lucene.apache.org >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > >--------------------------------------------------------------- > > > > > > >> > ---- > > > > > > >> > -- To unsubscribe, e-mail: > > > > > > java-user-unsubscr...@lucene.apache.org > > > > > > >> > For additional commands, e-mail: > > > > > > java-user-h...@lucene.apache.org >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> >----------------------------------------------------------------- > > > > > > >> ---- > > > > > > >> To unsubscribe, e-mail: java-user- > > unsubscr...@lucene.apache.org > > > > > > >> For additional commands, e-mail: > > > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > >------------------------------------------------------------------ > > > > > > > --- > > > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > > > For additional commands, e-mail: java-user- > > h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-------------------------------------------------------------------- > > > > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org