2012/7/4 Anish Patil <[email protected]>: >>>There are 63 languages included. Chinese and Japanese (zh and ja) are >>>intentionally left out as they were too big / not so interesting. Other than >>>that, English is particularly large, as expected, and the rest vary in size, >>>from a few thousand to tens of millions of unique words. > > For some of the indian languages wiki pedia words contain spelling mistakes, > hope that will not affect your work. > Marathi Word list contains words like "अॅक्सेसदिनांक",अॅरिझोना which are > incorrect.
This is true for Wikipedia in all languages. It may be incorrect with regards to standard spelling set by a government or a language academy, but it may be common in real life, so it is still useful for statistics. It may also point to technical issues with fonts or keyboards, that make people write incorrectly - for example, the right letter may not appear on the common keyboard layout, or a transliteration input method may have bugs. In the particular case of Marathi, I know somebody who is working on improving the spelling in Wikipedia. I'll gladly connect you, if you're interested. -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
