Illegal character replacements in ISOLatin1AccentFilter
-------------------------------------------------------

                 Key: LUCENE-1029
                 URL: https://issues.apache.org/jira/browse/LUCENE-1029
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 2.2
            Reporter: Marko Asplund


The ISOLatin1AccentFilter class is responsible for replacing "accented 
characters in the ISO Latin 1 character set by their unaccented equivalent".

Some of the replacements performed for scandinavian characters (used e.g. in 
the finnish, swedish, danish languages etc.) are illegal. The scandinavian 
characters are different from the accented characters used e.g. in latin based 
languages such as french in that these characters (ä, ö, å) represent entirely 
independent sounds in the language and therefore cannot be represented with any 
other sound without change of meaning. It is therefore illegal to replace these 
characters with any other character.

This means for example that you can't change the finnish word sää (weather) to 
saa (will have) because these are two entirely different words with different 
meaning. The same applies to scandinavian languages as well.

There's no connection between the sounds represented by ä and a; ö and o or å 
and a. 

In addition to the three characters mentioned above danish and norwegian use 
other special characters such as ø and æ. It should be checked if the 
replacement is legal for these characters.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to