I wouldn't pretend to know the truth on this matter, but you might
update the wikipedia article http://en.wikipedia.org/wiki/Diacritic if
you do, as it does not agree with your comments.
Marko Asplund (JIRA) wrote:
[
https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marko Asplund updated LUCENE-1029:
----------------------------------
Attachment: ISOLatin1AccentFilter-javadoc.patch
I think the class javadoc is very misleading so I'm attaching a documentation
patch.
For one the scandinavian characters do not contain diacritical marks or
accents. The dots in ä and ö as well as the ring in å is considered part of
the letter, not diacritics. The class name implies that it does something with
accents so for this reason I would not have expected the class to replace the
scandinavian characters.
The javadoc also says it replaces characters with their "equivalent" ASCII
characters. There are no equivalents for the scandinavian characters.
Illegal character replacements in ISOLatin1AccentFilter
-------------------------------------------------------
Key: LUCENE-1029
URL: https://issues.apache.org/jira/browse/LUCENE-1029
Project: Lucene - Java
Issue Type: Bug
Components: Analysis
Affects Versions: 2.2
Reporter: Marko Asplund
Attachments: ISOLatin1AccentFilter-javadoc.patch
The ISOLatin1AccentFilter class is responsible for replacing "accented characters in
the ISO Latin 1 character set by their unaccented equivalent".
Some of the replacements performed for scandinavian characters (used e.g. in
the finnish, swedish, danish languages etc.) are illegal. The scandinavian
characters are different from the accented characters used e.g. in latin based
languages such as french in that these characters (ä, ö, å) represent entirely
independent sounds in the language and therefore cannot be represented with any
other sound without change of meaning. It is therefore illegal to replace these
characters with any other character.
This means for example that you can't change the finnish word sää (weather) to
saa (will have) because these are two entirely different words with different
meaning. The same applies to scandinavian languages as well.
There's no connection between the sounds represented by ä and a; ö and o or å and a.
In addition to the three characters mentioned above danish and norwegian use other special characters such as ø and æ. It should be checked if the replacement is legal for these characters.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]