[ 
https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534797
 ] 

Uwe Schindler commented on LUCENE-1029:
---------------------------------------

This is true for other european languages, too. In Germany it is also a 
difference between "ä" and "a" (it sounds different). A correct replacement in 
German would be to replace "ä" by "ae" (two chars).
But I think it is not a problem. The real use of this filter is to enable 
people coming from other countries without the keys on their keyboard to search 
in a lucene index. Many americans for example search for the German last name 
"Müller" always by typing "Muller", because they cannot enter the umlaut. In 
Scandianian languages it will be the same, they would enter "o" instead of "ø". 
The accent filter is just to enable this. If you create an index just for one 
scandinavian country, just leave this filter out.
And in principle it is no problem to find documents that does not match the 
entered keywords exact. 
The filter is the same like the Soundex filter. After a transformation to 
soundex the word lokks different and has never his original meaning :)

> Illegal character replacements in ISOLatin1AccentFilter
> -------------------------------------------------------
>
>                 Key: LUCENE-1029
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1029
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Marko Asplund
>
> The ISOLatin1AccentFilter class is responsible for replacing "accented 
> characters in the ISO Latin 1 character set by their unaccented equivalent".
> Some of the replacements performed for scandinavian characters (used e.g. in 
> the finnish, swedish, danish languages etc.) are illegal. The scandinavian 
> characters are different from the accented characters used e.g. in latin 
> based languages such as french in that these characters (ä, ö, å) represent 
> entirely independent sounds in the language and therefore cannot be 
> represented with any other sound without change of meaning. It is therefore 
> illegal to replace these characters with any other character.
> This means for example that you can't change the finnish word sää (weather) 
> to saa (will have) because these are two entirely different words with 
> different meaning. The same applies to scandinavian languages as well.
> There's no connection between the sounds represented by ä and a; ö and o or å 
> and a. 
> In addition to the three characters mentioned above danish and norwegian use 
> other special characters such as ø and æ. It should be checked if the 
> replacement is legal for these characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to