<<<However, is there any special case that you have?>>> Yes, the character set we use is, as I remember, MARC-8. Which I don't think is the ISOLatin...., but since I didn't know about that filter when we had our problem, I didn't even look. Oh well, smarter/braver/lazier next time <G>...
Which is why I love this list, I find things like this and look smarter next time something similar comes up <G>. Thanks Erick On 7/30/07, Chris Lu <[EMAIL PROTECTED]> wrote: > > Hi, Erick, > > I added ISOLatin1AccentFilter to FrenchAnalyzer following Samir's tip, > and it works great! And I think it's the right way to go. Problems > like "You have to store the data raw for display purposes if you want > the accents to show though" will go away since Analyzer already have > the original text and analyzed token mechanism built-in. And it's > pretty easy to do! > > However, is there any special case that you have? Not really knowing > French, I only tested one word, "fenêtre", and it's analyzed into > "fenetre". > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > > On 7/30/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Gosh, I sure hope not, because that would mean that we rolled our > > own for no good reason. We wound up just collapsing > > the input stream by substituting plain old 'e' for all the accented > > variants before indexing and before searching. Be *really* careful > > what character set you're using. > > > > Actually, we would have still had to roll our own because the > > character mapping was...er...wonky <G>.... > > > > You have to store the data raw for display purposes if you want the > > accents to show though... > > > > Best > > Erick > > > > On 7/30/07, Chris Lu <[EMAIL PROTECTED]> wrote: > > > > > > Hi, > > > > > > I am not a French speaker, but here are some questions regarding > > > French analyzer: > > > > > > Is there any analyzer that can do this? Analyze accentuated letters to > > > non accentuated corresponding letters (é,è,ê,ë -> e), so that > > > > > > search "fenêtre" (=window) found all docs with "fenêtre" or "fenetre" > > > and > > > search "fenetre" found the same result, all docs with "fenêtre" or > > > "fenetre" > > > > > > Current analyzers, Snowball-French and FrenchAnalyzer don't have this > > > feature. > > > > > > -- > > > Chris Lu > > > ------------------------- > > > Instant Scalable Full-Text Search On Any Database/Application > > > site: http://www.dbsight.net > > > demo: http://search.dbsight.com > > > Lucene Database Search in 3 minutes: > > > > > > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >