That can be a starting point (Just play a little bit with with tokenizers & filters )
public class ModifiedStandardAnalyzer : Analyzer { public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader) { StandardTokenizer tokenStream = new StandardTokenizer(reader, true); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new ASCIIFoldingFilter(result); return result; } } DIGY -----Original Message----- From: Gustavo Poll [mailto:gkp...@gmail.com] Sent: Tuesday, September 06, 2011 10:06 PM To: lucene-net-user@lucene.apache.org Subject: Re: [Lucene.Net] How to index/search a file name thanks again... Ok, it is not.. standard analyzer: [name.surn...@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?] [??????] [ssß] UnaccentedWordAnalyzer: [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc] [gusioc] [aß?de?] [??????] [ssss] StandardAnalyzer would be perfect to my application if it was accent insensitive... Can anyone tell me please, the easiest way to code such analyzer? (accent insensitive Standard Analyzer) I hear it is not a good idea to make a class that inherits StandardAnalyzer cause StandardAnalyzer should be a final class.. Is this coherent? Appreciate any help please... Gustavo Poll 2011/9/6 Digy <digyd...@gmail.com> > A function is worth a thousand words J > > > > > > void Test() > > { > > Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(), > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() }; > > string input = "name.surn...@gmail.com 123.456 3,5 AT&T > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß"; > > > > foreach (Analyzer analyzer in analyzers) > > { > > TokenStream ts = analyzer.TokenStream("", new > StringReader(input)); > > Lucene.Net.Analysis.Token t = ts.Next(); > > while (t != null) > > { > > Console.Write("[" + t.TermText() + "] "); > > t = ts.Next(); > > } > > Console.WriteLine(); Console.WriteLine(); > > > > } > > } > > > > DIGY > > > > > > -----Original Message----- > From: Gustavo Poll [mailto:gkp...@gmail.com] > Sent: Tuesday, September 06, 2011 9:00 PM > To: lucene-net-user@lucene.apache.org > Subject: Re: [Lucene.Net] How to index/search a file name > > > > thanks DIGY, I have interest in that too... Let me see if i understood: > > > > UnaccentedWordAnalyzer is like Standard Analyzer, but accent insensitive? > > > > Thanks! > > Gustavo Poll > > > > > > 2011/9/6 digy digy <digyd...@gmail.com> > > > > > That may help > > > > > > UnaccentedWordAnalyzer @ > > > > > > > https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs > > > > > > > > > DIGY > > > > > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <floyd...@gmail.com> wrote: > > > > > > > Hi everyone, > > > > > > > > I have a question that annoying me many times. my situation is that I > > > need > > > > to index file name and need to be searchable using partial file name. > > > > > > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name) > > > > > > > > When I shot queries > > > > > > > > filename:ABCD no match return. > > > > > > > > filename:2010Q2_ABCD match > > > > > > > > filename:Report* match > > > > > > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current > > > > filename > > > > field is set to tokenized/indexed/store > > > > > > > > What I want is when user type any part of file name that lucene.Net can > > > > match. > > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls) > > > > > > > > Please help on this or kindly direct me a way to solve it. > > > > > > > > Floyd > > > > > > > > > > > ----- > > Bu iletide virüs bulunamadı. > > AVG tarafından kontrol edildi - www.avg.com > > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011 > > ----- Bu iletide virüs bulunamadı. AVG tarafından kontrol edildi - www.avg.com Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011