On Sun, 15 Dec 2002, Nadav Har'El wrote: > On Sun, Dec 15, 2002, Oded Arbel wrote about "Re: Announcement: free Hebrew >Spell-Checker": > > Can you please expand on the reasons you chose not to incorporate the > > generated word lists as a language package to some existing spell checker > > (such as myspell or aspell) and thus making it immidietly useful for "end > > users" ? > > During our tests, we got both ispell and aspell to work with our word lists. > However, this approach proved both impractical and limited because: > > 1. Aspell does not (or at least we didn't figure out how to) support prefixes, > so instead of a 125,000 word word list (in this release) we had to multiply > this by the number of prefixes (he, shin, etc. - about 20 prefixes in all) > and the resulting over-million-word list took ages to load into aspell > (hspell is much faster, even when written in Perl!).
However, you could suggest a *patch* to aspell which will replace the word-checking routine for Hebrew.... BTW - Any plans of creating a CPAN module Lingua::HE::Spell (or the likes)? If so, I suggest an option of tying a hash to check words (in addition to a seperate function), that way programs based on simple hashed wordlists will still work with minimal change. > 2. In hspell I could add our home-brew code which (for example) recognizes > acronyms and correct gimatria, while adding it into aspell will require > a lot of cooperation with the aspell project. > > You must understand that out of the time Dan and I spent on this project, > only about 1% (!) went into writing the "hspell" program (a Perl script, > actually). 99% of the work went into writing the inflection programs for > nouns and verbs, and building the dictionaries, and this was the actual > important work, work that nobody has done before us (in Hebrew and free). Did you look at the work of Erel Segal (http://www.cs.technion.ac.il/~erelsgl) and his morphologial analyzer? [snip] > BTW, if you look at our TODO, you'll see that one of the plans for some > future release is to write a C library for interfacing with the word lists; > This C library, once written (by us or by someone else), could be used > from aspell, pspell, OpenOffice, kword, or whatever. Do you intend to LGPL or GPL the C library? Also, don't forget a PERL module too! Alon -- This message was sent by Alon Altman ([EMAIL PROTECTED]) ICQ:1366540 The RIGHT way to contact me is by e-mail. I am otherwise nonexistent :) -------------------------------------------------------------------------- -=[ Random Fortune ]=- I program, therefore I am. ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
