We do have EdgeNGramTokenizer if that is what you are after. See how Solr uses it here: http://search-lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java||EdgeNGramTokenizer
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Clemens Wyss <clemens...@mysign.ch> > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > Sent: Wed, May 4, 2011 2:07:40 AM > Subject: AW: AW: AW: AW: "fuzzy prefix" search > > I know this is just an example. > But even the WhitespaceAnalyzer takes the words apart, which I don't want. I >would like the phrases as they are (maximum 3 words, e.g. "Merlot del >Ticino", >...) to be n-gram-ed. I hence want to have the n-grams. > Mer > Merl > Merlo > Merlot > Merlot > Merlot d > ... > > Regards > Clemens > > -----Ursprüngliche Nachricht----- > > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > Gesendet: Dienstag, 3. Mai 2011 23:12 > > An: java-user@lucene.apache.org > > Betreff: Re: AW: AW: AW: "fuzzy prefix" search > > > > Clemens - that's just an example. Stick another tokenizer in there, like > > WhitespaceTokenizer in there, for example. > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem > > search :: http://search-lucene.com/ > > > > > > > > ----- Original Message ---- > > > From: Clemens Wyss <clemens...@mysign.ch> > > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > > > Sent: Tue, May 3, 2011 4:31:14 PM > > > Subject: AW: AW: AW: "fuzzy prefix" search > > > > > > But doesn't the KeyWordTokenizer extract single words out oft he > > >stream? I would like to create n-grams on the stream (field content) as it > > is... > > > > > > > -----Ursprüngliche Nachricht----- > > > > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > > > Gesendet: Dienstag, 3. Mai 2011 21:31 > > > > An: java-user@lucene.apache.org > > > > Betreff: Re: AW: AW: "fuzzy prefix" search > > > > > > > > Clemens, > > > > > > > > Something a la: > > > > > > > > public TokenStream tokenStream (String fieldName, Reader r) { > > > > return nw EdgeNGramTokenFilter(new KeywordTokenizer(r), > > > > EdgeNGramTokenFilter.Side.FRONT, 1, 4); } > > > > > > > > > > > > Check out page 265 of Lucene in Action 2. > > > > > > > > Otis > > > > ---- > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene > > > > ecosystem search :: http://search-lucene.com/ > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: Clemens Wyss <clemens...@mysign.ch> > > > > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > > > > > Sent: Tue, May 3, 2011 12:57:39 PM > > > > > Subject: AW: AW: "fuzzy prefix" search > > > > > > > > > > How does an simple Analyzer look that just "n-grams" the >docs/fields. > > > > > > > > > > class SimpleNGramAnalyzer extends Analyzer { @Override > > > > > public TokenStream tokenStream ( String fieldName, Reader reader ) > > > > > { > > > > > EdgeNGramTokenFilter... ??? > > > > > } > > > > > } > > > > > > > > > > > -----Ursprüngliche Nachricht----- > > > > > > Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > > > > > > Gesendet: Dienstag, 3. Mai 2011 13:36 > > > > > > An: java-user@lucene.apache.org > > > > > > Betreff: Re: AW: "fuzzy prefix" search > > > > > > > > > > > > Hi, > > > > > > > > > > > > I didn't read this thread closely, but just in case: > > > > > > * Is this something you can handle with synonyms? > > > > > > * If this is for English and you are trying to handle typos, > > > > > > there is a > > >list > > > > >of > > > > > > common English misspellings out there that you could use for > > > > > > this > > > > perhaps. > > > > > > * Have you considered n-gramming your tokens? Not sure if > > > > > > this would > > > > help, > > > > > > didn't read messages/examples closely enough, but you may want > > > > > > to > > > > look at > > > > > > this if you haven't done so yet. > > > > > > > > > > > > Otis > > > > > > ---- > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > > > Lucene ecosystem > > > > > > search :: http://search-lucene.com/ > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > > > From: Clemens Wyss <clemens...@mysign.ch> > > > > > > > To: "java-user@lucene.apache.org" <java- > > u...@lucene.apache.org> > > > > > > > Sent: Tue, May 3, 2011 5:25:30 AM > > > > > > > Subject: AW: "fuzzy prefix" search > > > > > > > > > > > > > > >PrefixQuery > > > > > > > I'd like the combination of prefix and fuzzy ;-) because > > > > > > > people > > >could > > > > > > >also type "menlo" or "märl" and in any of these cases I'd > > > > > > like to > > >get > > > > > > >a hit on Merlot (for suggesting Merlot) > > > > > > > > > > > > > > > -----Ursprüngliche Nachricht----- > > > > > > > > Von: Ian Lea [mailto:ian....@gmail.com] > > > > > > > > Gesendet: Dienstag, 3. Mai 2011 11:22 > An: > > > > > > > java-user@lucene.apache.org > > > > > > > > Betreff: Re: "fuzzy prefix" search > > > > > > > > > > > > > > > > I'd assumed that FuzzyQuery wouldn't ignore case but I > > > > > > could be > > > > wrong. > > > > > > > > What would be the edit distance between "mer" and >"merlot"? > > > > Would > > > > > > > > it be less that 1.5 which I reckon would be the value of > > > > > > > > length(term)*0.5 as detailed in the javadocs? Seems > > > > > > > > unlikely, > > >but > > > > > > > > I don't really know anything about the Levenshtein (edit > > distance) > > > > > > algorithm as used by FuzzyQuery. > > > > > > > > Wouldn't a PrefixQuery be more appropriate here? > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Ian. > > > > > > > > > > > > > > > > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss > > > > > > > > <clemens...@mysign.ch> > > > > > > > > wrote: > > > > > > > > > Unfortunately lowercasing doesn't help. > > > > > > > > > Also, doesn't the FuzzyQuery ignore casing? > > > > > > > > > > > > > > > > > >> -----Ursprüngliche Nachricht----- > > > > > > > > >> Von: Ian Lea [mailto:ian....@gmail.com] > > > > > > > > >> Gesendet: Dienstag, 3. Mai 2011 11:06 > > > > > > > > >> An: java-user@lucene.apache.org > > > > > > > > >> Betreff: Re: "fuzzy prefix" search > > > > > > > > >> > > > > > > > > >> Mer != mer. The latter will be what is indexed > > > > > > > > because > > > > > > > > >> StandardAnalyzer calls LowerCaseFilter. > > > > > > > > >> > > > > > > > > >> -- > > > > > > > > >> Ian. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss > > > > > > > > <clemens...@mysign.ch> > > > > > > > > >> wrote: > > > > > > > > >> > Sorry for coming back to my issue. Can anybody > > > > > > > > >> explain why > > >my > > > > > > > > "simple" > > > > > > > > >> unit test below fails? Any hint/help appreciated. > > > > > > > > >> > > > > > > > > > >> > Directory directory = new RAMDirectory(); > > > > > > > > >> IndexWriter > > > > > > > > >> > indexWriter = new IndexWriter( directory, new > > > > > > > > >> > StandardAnalyzer( > > > > > > > > Version.LUCENE_31 > > > > > > > > >> > ), IndexWriter.MaxFieldLength.UNLIMITED ); Document > > > > document > > > > > > = > > > > > > > > new > > > > > > > > >> > Document(); document.add( new Field( "test", "Merlot", > > > > > > > > >> > Field.Store.YES, Field.Index.ANALYZED ) ); > > > > > > > > >> > indexWriter.addDocument( > > > > > > > > >> > document ); IndexReader indexReader = > > > > > > > > indexWriter.getReader(); > > > > > > > > >> > IndexSearcher searcher = new IndexSearcher( > > > > > > indexReader ); > > > > > > > > >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), > > 0.5f, > > >0, > > > > > > > > >> > 10 ); // or Query q = new FuzzyQuery( new Term( > > > > > > > > >> > "test", > > "Mer" > > > > > > > > >> > ), 0.5f); TopDocs result = searcher.search( q, 10 > > > > > ); > > > > > > > > >> > Assert.assertEquals( 1, result.totalHits ); > > > > > > > > >> > > > > > > > > > >> > - Clemens > > > > > > > > >> > > > > > > > > > >> >> -----Ursprüngliche Nachricht----- > > > > > > > > >> >> Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > > > > > > >> >> Gesendet: Montag, 2. Mai 2011 23:01 > > > > > > > > >> >> An: java-user@lucene.apache.org > > > > > > > > >> >> Betreff: AW: "fuzzy prefix" search > > > > > > > > >> >> > > > > > > > > >> >> Is it the combination of FuzzyQuery and Term which > > > > > makes > > >the > > > > > > > > >> >> search to go for "word boundaries"? > > > > > > > > >> >> > > > > > > > > >> >> > -----Ursprüngliche Nachricht----- > > > > > > > > >> >> > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > > > > > > > > >> >> > Gesendet: Montag, 2. Mai 2011 14:13 > > > > > > > > >> >> > An: java-user@lucene.apache.org > > > > > > > > >> >> > Betreff: AW: "fuzzy prefix" search > > > > > > > > >> >> > > > > > > > > > >> >> > I tried this too, but unfortunately I only get > > > > > > > > >> hits when > > > > > > > > >> >> > the search term is a least as long as the word to >be > > >looked > > > > up. > > > > > > > > >> >> > > > > > > > > > >> >> > E.g.: > > > > > > > > >> >> > ... > > > > > > > > >> >> > Directory directory = new RAMDirectory(); >IndexWriter > > > > > > > > >> >> > indexWriter = new IndexWriter( directory, >> >> > > > > > > > > > IndexManager.getIndexingAnalyzer( > > > > > > > > >> >> LOCALE_DE ), > > > > > > > > >> >> > IndexWriter.MaxFieldLength.UNLIMITED > > >); > > > > > > > > >> >> > > > > > > > > > >> >> > Document document = new Document(); > > document.add( > > > > new > > > > > > > > Field( > > > > > > > > >> >> > "test", "Merlot", > > > > > > > > >> >> > Field.Store.YES, > > Field.Index.ANALYZED >) ); > > > > > > > > >> >> indexWriter.addDocument( > > > > > > > > >> >> > document ); > > > > > > > > >> >> > > > > > > > > > >> >> > IndexReader indexReader = indexWriter.getReader(); > > > > > > > > >> >> > IndexSearcher > > > > > > > > >> >> > searcher = new IndexSearcher( indexReader ); >> > > > > > > > >> > > > > > > > > > >> >> > Query q = new FuzzyQuery( new Term( "test", "Mer" ), > > >0.6f, > > > > > > > > >> >> > 1 ); TopDocs result = searcher.search( q, 10 ); > > > > > > > > >> >> > Assert.assertEquals( >> >> > 1, > > > > > > > > >> >> result.totalHits ); ... > > > > > > > > >> >> > > > > > > > > > >> >> > > -----Ursprüngliche Nachricht----- > > > > > > > > >> >> > > Von: Uwe Schindler [mailto:u...@thetaphi.de] >> > > > > > > > > >> > > Gesendet: Montag, 2. Mai 2011 13:50 >> >> > > An: > > > > > > > > java-user@lucene.apache.org > > > > > > > > >> >> > > Betreff: RE: "fuzzy prefix" search > > > > > > > > >> >> > > > > > > > > > > >> >> > > Hi, > > > > > > > > >> >> > > > > > > > > > > >> >> > > You can pass an integer to FuzzyQuery which >defines > > the > > > > > > > > >> >> > > number of characters that are seen as prefix. > > > > > > > > >> >> > So all > > > > > > > > >> >> > > terms must match > > > > > > > > >> >> > > this prefix and the rest of each term is matched > > > > >using > > > > >fuzzy. > > > > > > > > >> >> > > > > > > > > > > >> >> > > Uwe > > > > > > > > >> >> > > > > > > > > > > >> >> > > ----- > > > > > > > > >> >> > > Uwe Schindler > > > > > > > > >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > > > > > >> http://www.thetaphi.de > > > > > > > > >> >> > > eMail: u...@thetaphi.de > > > > > > > > >> >> > > > > > > > > > > >> >> > > > -----Original Message----- > > > > > > > > >> >> > > > From: Clemens Wyss > > > > > > > [mailto:clemens...@mysign.ch] > > > > > > > > >> >> > > > Sent: Monday, May 02, 2011 1:47 PM >> > > > >To: > > > > > > > > >> java-user@lucene.apache.org > > > > > > > > >> >> > > > Subject: "fuzzy prefix" search >> >> > > > > > > > > > > > >> >> > > > I'd like to search fuzzily but not on a full >term. > > > > > > > > >> >> > > > E.g. > > > > > > > > >> >> > > > I have a text "Merlot del Ticino" > > > > > > > > >> >> > > > I'd like > > > > > > > > >> >> > > > "mer", "merr", "melo", ... to match. > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > If I use FuzzyQuery only "merlot, "merlott" >hit. > > >What > > > > > > > > >> >> > > > Query-combination should I use? > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > Thx > > > > > > > > >> >> > > > Clemens > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > >-------------------------------------------------------- > > > > > > > > >> >> > > > ---- > > > > > > > > >> >> > > > --- > > > > > > > > >> >> > > > --- > > > > > > > > >> >> > > > -- > > > > > > > > >> >> > > > - To unsubscribe, e-mail: > > > > > > > > >> >> > > > java-user-unsubscr...@lucene.apache.org > > > > > > > > >> >> > > > For additional commands, e-mail: > > > > > > > > >> >> > > > java-user-h...@lucene.apache.org >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > >---------------------------------------------------------- > > > > > > > > >> >> > > ---- > > > > > > > > >> >> > > --- > > > > > > > > >> >> > > --- > > > > > > > > >> >> > > - To unsubscribe, e-mail: > > > > > > > > >> >> > > java-user-unsubscr...@lucene.apache.org > > > > > > > > >> >> > > For additional commands, e-mail: > > > > > > > > >> >> > > java-user-h...@lucene.apache.org > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > >-------------------------------------------------------------- > > > > > > > > >> >> -- > > > > > > > > >> >> > --- > > > > > > > > >> >> > -- To unsubscribe, e-mail: > > > > > > > > >> >> > java-user-unsubscr...@lucene.apache.org > > > > > > > > >> >> > For additional commands, e-mail: > > > > > > > > >> >> > java-user-h...@lucene.apache.org > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > >-------------------------------------------------------------- > > > > > > > > >> >> ---- > > > > > > > > >> >> --- To unsubscribe, e-mail: > > > > > > > > >> >> java-user-unsubscr...@lucene.apache.org > > > > > > > > >> >> For additional commands, e-mail: > > > > > > > > java-user-h...@lucene.apache.org >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > >--------------------------------------------------------------- > > > > > > > > >> > ---- > > > > > > > > >> > -- To unsubscribe, e-mail: > > > > > > > > java-user-unsubscr...@lucene.apache.org > > > > > > > > >> > For additional commands, e-mail: > > > > > > > > java-user-h...@lucene.apache.org >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > >----------------------------------------------------------------- > > > > > > > > >> ---- > > > > > > > > >> To unsubscribe, e-mail: java-user- > > > > unsubscr...@lucene.apache.org > > > > > > > > >> For additional commands, e-mail: > > > > > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >------------------------------------------------------------------ > > > > > > > > > --- > > > > > > > > > To unsubscribe, e-mail: > > > > > > > > java-user-unsubscr...@lucene.apache.org > > > > > > > > > For additional commands, e-mail: java-user- > > > > h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-------------------------------------------------------------------- > > > > > > > > - To unsubscribe, e-mail: > > > > java-user-unsubscr...@lucene.apache.org > > > > > > > > For additional commands, e-mail: > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: > > > > java-user-unsubscr...@lucene.apache.org > > > > > > > For additional commands, e-mail: > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > >--------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: > > > > java-user-unsubscr...@lucene.apache.org > > > > > > For additional commands, e-mail: > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > --- > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org