I feel like a fool continuing this debate, being the least intelligent guy in the room, but here goes:

My point was that wikipedia (the link i gave and other definitions I saw) seem to refer to the little markings around a letter as diacriticals whether they mean the letter is a completely different letter or not (see the part mentioning Scandinavian, as well as possibly Websters dictionary). Marko disputed this in his last comment, and I don't know that he is wrong. All I have seen seems to indicate this though.

I also dispute this sentence in the new javadoc patch proposed:

*It will also be impossible to search for the word in its original form.*

If you use the same analyzer at search and query time, there should be no such 
problem.


Doug Cutting wrote:
Mark Miller wrote:
I wouldn't pretend to know the truth on this matter, but you might update the wikipedia article http://en.wikipedia.org/wiki/Diacritic if you do, as it does not agree with your comments.

Wikipedia says, "Swedish uses characters identical to a-diaeresis (ä) and o-diaeresis (ö)". This is a little ambiguous. Identical how? I think they mean "visually identical to". The distinction is whether Swedish treats 'ä' as a variant of 'a' or as a completely separate letter. The latter is the case.

http://en.wikipedia.org/wiki/Umlaut_(diacritic) states:

  Swedish [...] treat[s] them as independent letters.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to