On 6:37:15 pm 11/13/06 "Ethan Bradford" <[EMAIL PROTECTED]> wrote: > Kevin, one small fact on Indic graphology that may help: consonants > have an intrinsic vowel (an "a"), so "ka" is one Unicode character. > There are then combining vowels, so to write "ko", you use "ka" plus > the combining "o". To get pairs of consonants, you need to suppress > the inherent vowel, which is what the halant does. Thus, "kra" is > "ka + drop-the-a + ra".
That is indeed correct. I should have included that, but am so used to this that I assumed that everyone knew what I was talking about. Thus, the "syllable" that I think that we need to operate on is a consonant (or a conjunct formed by combining consonants with a halant), along with any vowel modifiers. Also, while most Indian languages have a set of allowed conjuncts, it is possible to create new conjuncts arbitrarily by combining consonants, using a halant. This creation of new conjuncts is often needed to spell words borrowed from English, and other languages. Thus, the set of possible conjuncts is not bounded. > Gora, how do Hindi keyboards support the entry of halant? If > entering the halant is just another keystroke (so the codes are > entered as they are stored in Unicode), then why wouldn't the > transposition of a halant with another keystroke be just as likely as > any other transposition? "Teh" makes no sense whatever in English, > but I type it often enough. Or are there separate keystrokes for the > half-width (i.e. vowel-less) consonants, which automatically add a > halant? If that's the case, then we want to treat "ka+halant" > special, not "kra". That depends on the keyboard input method, of which there are a variety to choose from. There are two main classes of input methods, (a) phonetic (well, actually pseudo-phonetic, as they use some kind of transliteration scheme from English), and (b) non-phonetic mappings that aim for efficiency in typing, and where the keyboard layout has nothing to do with the sound of the character. For the first type, e.g., with the ITRANS scheme, a single keystroke can produce a half-consonant, e.g., the English letter 'k' produces "ka + halant". The non-phonetic mappings have a separate key for the halant. I believe that there are at least two levels at which spelling errors are made: At the mental level while composing sentences in your mind (such as "occasion" mispelled as "ocassion"), and at the typing level where the wrong character is typed ("teh" for "the", as you note). While typing errors certainly do need to be accounted for, I would argue that in this case, where an Indian language conjunct is involved, a typing error leading to a transposition of a halant is less likely, as the glyph would change leading to more of a visual feedback (at least if one hunts and pecks, like me). E.g., "the" does not look too different from "teh", but "à¤à¥à¤à¤°" and "à¤à¤à¥à¤°" in my earlier example do.For errors at the mental level, I believe that it is well-established that such confusion occurs between similar-sounding words. Therefore, my guess would be that, in general, it is possible to confuse a conjunct with another, or with the two (or more) consonants making up the conjunct, but it is unlikely that a conjunct would be confused with a single consonant, as they would sound quite different, and hence not be remembered as similar. I was driven to think about this, because we have been trying out aspell with new rules for Hindi, and the results have been counter- intuitive. I also think that it is also possible that syllables instead of characters need to be used when the scores are refined with try_split(), try_ngram(), etc., but I would need to understand the working of these functions better before I can make any kind of a definitive statement. Admittedly, all this is anecdotal at this point, and we need to do some quantitative measurements with a decently-sized test kernel of mispelled words like Kevin has done for English. I am also enlisting Hindi linguists, and those from other languages to help design an Indian language spellchecker. I will make the design work available on a public Wiki. Regards, Gora _______________________________________________ Aspell-devel mailing list Aspell-devel@gnu.org http://lists.gnu.org/mailman/listinfo/aspell-devel