I noticed some rather strange (inconsistent?) behaviour of Abiword 0.7.13 with respect to searching and accented characters (characters in the Latin-Extended-A range). When doing a case-insensitive search for a base Ascii Latin character, some, but not all, accented characters are matched. For instance: "o" matches U+014F (o with breve) "g" matches U+011F (g with breve) "i" matches U+0131 (small dotless i) and U+0130 (capital dotted i) This is a Good Thing, I should say. But: "o" does not match U+0151 (o with double acute) "a" does not match U+00E4 (a with umlaut), or U+1E00 (A with ring below) and so on. This is a Bad Thing, because it's inconsistent. Incidentally, this seems to be related to two other features affecting those accented characters that do get matched: (a) when highlighting a passage and then opening the search dialogue, the highlighted passage is displayed there. Non-Latin-1 characters are either left out, or converted to Ascii best-match equivalents. Characters such as U+014F, U+011F, U+0131, or U+0130 are displayed as Ascii (o, g, i, i respectively). They are the ones that will get matched. U+0151 gets displayed as '"o' (but searching for neither 'o' nor '"o' will match it.) The Euro symbol gets displayed as "EUR" (but searching for "EUR" will not match it.) Note that these "best matches" are not the ones Windows defines (I'm on Win98). They are apparently of Abiword's own making. (b) Characters such as U+014F, U+011F, U+0131, or U+0130 are also converted to plain Ascii when saving a document in .abw or .html format. Needless to say, this is a Very Bad Thing. (I just filed a bug in Bugzilla about that one.) I just thought I'd ask in case there's any hidden logic behind this behaviour, before feeding it into Bugzilla. Best, Lukas ----------------------------------------------------- Lukas Pietsch University of Freiburg English Department Phone (p.) (#49) (761) 696 37 23 mailto:[EMAIL PROTECTED] ----------------------------------------------- To unsubscribe from this list, send a message to [EMAIL PROTECTED] with the word unsubscribe in the message body.
