Just to give a feedback, in case someone is interested - ModifiedStandardAnalyzer class seems to work perfectly as a Standard Analyzer but accent insensitive... A small difference occured with the last character, but it does not belong to the portuguese alphabet, so I think there's no problem in ignoring it in my case...
Thanks Digy! Test results: (tokenizing the expression "name.surn...@gmail.com 123.456 3,5 AT&T João Avião Calção ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß") StandardAnalyzer: [name.surn...@gmail.com] [123.456] [3,5] [at&t] [joão] [avião] [calção] [güsıöç] [güsiöç] [aß?de?] [??????] [ssß] ModifiedStandardAnalyzer: (accent insensitive) [name.surn...@gmail.com] [123.456] [3,5] [at&t] [joao] [aviao] [calcao] [gusioc] [gusioc] [aß?de?] [??????] [ssss] Thanx Gustavo Poll 2011/9/6 Gustavo Poll <gkp...@gmail.com> > thanks, I'll do it... > > 2011/9/6 Digy <digyd...@gmail.com> > >> That can be a starting point (Just play a little bit with with tokenizers >> & filters ) >> >> >> >> public class ModifiedStandardAnalyzer : Analyzer >> >> { >> >> public override TokenStream TokenStream(System.String fieldName, >> System.IO.TextReader reader) >> >> { >> >> StandardTokenizer tokenStream = new StandardTokenizer(reader, >> true); >> >> TokenStream result = new StandardFilter(tokenStream); >> >> result = new LowerCaseFilter(result); >> >> result = new ASCIIFoldingFilter(result); >> >> return result; >> >> } >> >> } >> >> >> >> DIGY >> >> >> >> -----Original Message----- >> From: Gustavo Poll [mailto:gkp...@gmail.com] >> Sent: Tuesday, September 06, 2011 10:06 PM >> To: lucene-net-user@lucene.apache.org >> Subject: Re: [Lucene.Net] How to index/search a file name >> >> >> >> thanks again... Ok, it is not.. >> >> >> >> standard analyzer: >> >> >> >> [name.surn...@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] >> [aß?de?] >> >> [??????] [ssß] >> >> >> >> UnaccentedWordAnalyzer: >> >> >> >> [name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc] >> >> [gusioc] [aß?de?] [??????] [ssss] >> >> >> >> >> >> StandardAnalyzer would be perfect to my application if it was accent >> >> insensitive... Can anyone tell me please, the easiest way to code such >> >> analyzer? (accent insensitive Standard Analyzer) >> >> >> >> I hear it is not a good idea to make a class that inherits >> StandardAnalyzer >> >> cause StandardAnalyzer should be a final class.. Is this coherent? >> >> >> >> Appreciate any help please... >> >> Gustavo Poll >> >> >> >> >> >> >> >> >> >> 2011/9/6 Digy <digyd...@gmail.com> >> >> >> >> > A function is worth a thousand words J >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > void Test() >> >> > >> >> > { >> >> > >> >> > Analyzer[] analyzers = new Analyzer[] { new >> StandardAnalyzer(), >> >> > new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() }; >> >> > >> >> > string input = "name.surn...@gmail.com 123.456 3,5 AT&T >> >> > ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß"; >> >> > >> >> > >> >> > >> >> > foreach (Analyzer analyzer in analyzers) >> >> > >> >> > { >> >> > >> >> > TokenStream ts = analyzer.TokenStream("", new >> >> > StringReader(input)); >> >> > >> >> > Lucene.Net.Analysis.Token t = ts.Next(); >> >> > >> >> > while (t != null) >> >> > >> >> > { >> >> > >> >> > Console.Write("[" + t.TermText() + "] "); >> >> > >> >> > t = ts.Next(); >> >> > >> >> > } >> >> > >> >> > Console.WriteLine(); Console.WriteLine(); >> >> > >> >> > >> >> > >> >> > } >> >> > >> >> > } >> >> > >> >> > >> >> > >> >> > DIGY >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -----Original Message----- >> >> > From: Gustavo Poll [mailto:gkp...@gmail.com] >> >> > Sent: Tuesday, September 06, 2011 9:00 PM >> >> > To: lucene-net-user@lucene.apache.org >> >> > Subject: Re: [Lucene.Net] How to index/search a file name >> >> > >> >> > >> >> > >> >> > thanks DIGY, I have interest in that too... Let me see if i understood: >> >> > >> >> > >> >> > >> >> > UnaccentedWordAnalyzer is like Standard Analyzer, but accent >> insensitive? >> >> > >> >> > >> >> > >> >> > Thanks! >> >> > >> >> > Gustavo Poll >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > 2011/9/6 digy digy <digyd...@gmail.com> >> >> > >> >> > >> >> > >> >> > > That may help >> >> > >> >> > > >> >> > >> >> > > UnaccentedWordAnalyzer @ >> >> > >> >> > > >> >> > >> >> > > >> >> > >> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > > DIGY >> >> > >> >> > > >> >> > >> >> > > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <floyd...@gmail.com> wrote: >> >> > >> >> > > >> >> > >> >> > > > Hi everyone, >> >> > >> >> > > > >> >> > >> >> > > > I have a question that annoying me many times. my situation is that >> I >> >> > >> >> > > need >> >> > >> >> > > > to index file name and need to be searchable using partial file >> name. >> >> > >> >> > > > >> >> > >> >> > > > example--> 2009&2010Q2_ABCD_Report.xls (the file name) >> >> > >> >> > > > >> >> > >> >> > > > When I shot queries >> >> > >> >> > > > >> >> > >> >> > > > filename:ABCD no match return. >> >> > >> >> > > > >> >> > >> >> > > > filename:2010Q2_ABCD match >> >> > >> >> > > > >> >> > >> >> > > > filename:Report* match >> >> > >> >> > > > >> >> > >> >> > > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current >> >> > >> >> > > > filename >> >> > >> >> > > > field is set to tokenized/indexed/store >> >> > >> >> > > > >> >> > >> >> > > > What I want is when user type any part of file name that lucene.Net >> can >> >> > >> >> > > > match. >> >> > >> >> > > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls) >> >> > >> >> > > > >> >> > >> >> > > > Please help on this or kindly direct me a way to solve it. >> >> > >> >> > > > >> >> > >> >> > > > Floyd >> >> > >> >> > > > >> >> > >> >> > > >> >> > >> >> > >> >> > >> >> > ----- >> >> > >> >> > Bu iletide virüs bulunamadı. >> >> > >> >> > AVG tarafından kontrol edildi - www.avg.com >> >> > >> >> > Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: >> 06.09.2011 >> >> > >> >> > >> >> >> >> ----- >> >> Bu iletide virüs bulunamadı. >> >> AVG tarafından kontrol edildi - www.avg.com >> >> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: >> 06.09.2011 >> >> >