Asalamu alaikum wa rahmatullaah On Saturday 12 November 2005 00:18, Mohammed Sameer wrote: > > So, as you can see, creating an arabic spellchecker is only a matter of > > populating 2 files and plugging them into OOo, or using Myspell/Aspell > > standalone. > > That's another problem, Creating the data files.
I read somewhere about someone who did a very cleaver move: He collected user-entered Arabic search strings from Google. There are some ways to automatically collect such words without actually typing them by hand. There are data structures (Sets) that would avoid duplications. With appropriate filtering techniques (I can share some), someone could automatically generate an interesting word list from existing search engines and Arabic web pages. I would rather spend a week or two developing such tools rather than type large amounts of data by hand. I don't think it is that difficult. The programmer generates an exclusion list (words not to include like min, ila, 'an, hum, etc.) and a minimal word list. A search starts using one or a random word in the basic list. Each result (a URL from the server is an Arabic web page). The page and its links are parsed. If the word is not in the exclusion list, and if the word is not in the basic list, you keep on adding it to the list. Usually, to avoid adding misspelled words, an authentic source is used (e.g., newspapers, academic papers, online books, etc.) Actually, the "wget" utility is a good tool to grab related web pages (online books, articles, directories, etc.) using the -r flag and passing a set of URLS (probably from a file) if you don't want to write your own tool. This is just a simple idea that I have never tried. I hope it's helpful. Salam, Abdalla Alothman _______________________________________________ Developer mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/developer

