Kaixo! I discovered that soundslike just handles ASCII only; and converts any non-ascii to some ascii value. In most cases of existing *_phonet.dat it doesn't matters; but in some cases it does.
French and Walloon are na example of that. For example, "c" and "ç" are very different, "ca" sounds "KA", but "ça" sounds "SA"; however, current phonet code handles "c" and "ç" just the same; as a result, "ça" is viewed as sounding "KA" too... another example is "e" vs "ê,é,è". At the end of a word, "e" (without accent) is always mute, eg: "livre" => "LIVR" but not if it is accented, eg: "livré" => LIVRE as a result, it is impossible to define some usefull soundslike rules if they involve non-ascii chars in the language. (I think also that it makes it impossible to defined soundslike rules for languages for wich non-ascii letters are even more proeminent, or even exclusively used; like Czeck, Esperanto, Russian,...) the idea of matching fully accented chars with "ascii only" versions is however a good one, but the match could involve several chars (eg: "ö" -> "oe" in German, and not "ö" -> "o"); the possibility to define an "asciification" table could help find the better suggestions when spell checking an unaccented ascii-only text; that is particularly true for those languages that, for lack of proper computer support, had been written in ascii for a long time, like Esperanto and Romanian for example. thanks -- Ki ça vos våye bén, Pablo Saratxaga http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Catalan or Esperanto] [min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]
pgp2t2qW3Gxjx.pgp
Description: PGP signature
_______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
