[ https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534804 ]
[EMAIL PROTECTED] edited comment on LUCENE-1029 at 10/15/07 4:47 AM: --------------------------------------------------------------- I think Uwe nailed this one. Stripping accents in general is just not "legal". But many times it is desirable. This filter does that for you. It goes without saying that if you strip the accent you change the meaning...likewise, when you stem a word you create illegal words... p.s. Changing this filter is not really a great option as it would break indexes out there that use it. I think the better idea would be to create a new stripper that has the alternate functionality that you are thinking of -- rather than stripping accents, replace accented characters with letters that approximate the original sound/meaning. was (Author: [EMAIL PROTECTED]): I think Uwe nailed this one. Stripping accents in general is just not "legal". But many times it is desirable. This filter does that for you. It goes without saying that if you strip the accent you change the meaning...likewise, when you stem a word you create illegal words... > Illegal character replacements in ISOLatin1AccentFilter > ------------------------------------------------------- > > Key: LUCENE-1029 > URL: https://issues.apache.org/jira/browse/LUCENE-1029 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.2 > Reporter: Marko Asplund > > The ISOLatin1AccentFilter class is responsible for replacing "accented > characters in the ISO Latin 1 character set by their unaccented equivalent". > Some of the replacements performed for scandinavian characters (used e.g. in > the finnish, swedish, danish languages etc.) are illegal. The scandinavian > characters are different from the accented characters used e.g. in latin > based languages such as french in that these characters (ä, ö, å) represent > entirely independent sounds in the language and therefore cannot be > represented with any other sound without change of meaning. It is therefore > illegal to replace these characters with any other character. > This means for example that you can't change the finnish word sää (weather) > to saa (will have) because these are two entirely different words with > different meaning. The same applies to scandinavian languages as well. > There's no connection between the sounds represented by ä and a; ö and o or å > and a. > In addition to the three characters mentioned above danish and norwegian use > other special characters such as ø and æ. It should be checked if the > replacement is legal for these characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]