[ 
https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534987
 ] 

Hoss Man commented on LUCENE-1029:
----------------------------------

The functionality of ISOLatin1AccentFilter shouldn't change in a way that 
wouldn't be bckward compatible.  if people feel the documentation is misleading 
and doesn't accurately reflect what the Filter does, then by all means please 
submit a documentation patch.

first and foremost the purpose of this filter is to replace accented characters 
with non-accented characters ... the equivalence described in the javadocs is 
one of visual character equivalence, not of semantic word equivalence -- that 
would be a lot more complicated.  if anyone would like to submit a patch 
contianing a new filter that is capable of doing that, i'm sure the community 
would certianly welcome it.

> Illegal character replacements in ISOLatin1AccentFilter
> -------------------------------------------------------
>
>                 Key: LUCENE-1029
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1029
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Marko Asplund
>
> The ISOLatin1AccentFilter class is responsible for replacing "accented 
> characters in the ISO Latin 1 character set by their unaccented equivalent".
> Some of the replacements performed for scandinavian characters (used e.g. in 
> the finnish, swedish, danish languages etc.) are illegal. The scandinavian 
> characters are different from the accented characters used e.g. in latin 
> based languages such as french in that these characters (ä, ö, å) represent 
> entirely independent sounds in the language and therefore cannot be 
> represented with any other sound without change of meaning. It is therefore 
> illegal to replace these characters with any other character.
> This means for example that you can't change the finnish word sää (weather) 
> to saa (will have) because these are two entirely different words with 
> different meaning. The same applies to scandinavian languages as well.
> There's no connection between the sounds represented by ä and a; ö and o or å 
> and a. 
> In addition to the three characters mentioned above danish and norwegian use 
> other special characters such as ø and æ. It should be checked if the 
> replacement is legal for these characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to