On Sun, Dec 15, 2002, Oded Arbel wrote about "Re: Announcement: free Hebrew
Spell-Checker":
> Can you please expand on the reasons you chose not to incorporate the
> generated word lists as a language package to some existing spell checker
> (such as myspell or aspell) and thus making it immidietly useful for "end
> users" ?
During our tests, we got both ispell and aspell to work with our word lists.
However, this approach proved both impractical and limited because:
1. Aspell does not (or at least we didn't figure out how to) support prefixes,
so instead of a 125,000 word word list (in this release) we had to multiply
this by the number of prefixes (he, shin, etc. - about 20 prefixes in all)
and the resulting over-million-word list took ages to load into aspell
(hspell is much faster, even when written in Perl!).
2. In hspell I could add our home-brew code which (for example) recognizes
acronyms and correct gimatria, while adding it into aspell will require
a lot of cooperation with the aspell project.
You must understand that out of the time Dan and I spent on this project,
only about 1% (!) went into writing the "hspell" program (a Perl script,
actually). 99% of the work went into writing the inflection programs for
nouns and verbs, and building the dictionaries, and this was the actual
important work, work that nobody has done before us (in Hebrew and free).
Nothing prevents someone else (or even us in some upcoming release) from
ditching "hspell" in favor of aspell, or to incorporate the spell-checking
code inside "end-user" programs like OpenOffice or Kword. In fact, I even
approached the OpenOffice people offering our help in integrating hspell
into OpenOffice.
I also will be happy if the ispell/aspell/whatever people decide to
incorporate the word lists generated by the Hspell project as a "language
package" of their spell checker.
If you look at hspell.pl you'll see that the code in it is fairly trivial,
simply looking up words in word tables generated in advanced (part by hand,
part automatically - into which the real "brains" of this project went),
with only relatively-simple "tricks" to recognize prefixes, gimatria, and so
on. This is intentional, because it makes our spell-checker very easy
(I believe) to integrate inside other programs (GPL of course!) which need
a Hebrew spell-checker.
BTW, if you look at our TODO, you'll see that one of the plans for some
future release is to write a C library for interfacing with the word lists;
This C library, once written (by us or by someone else), could be used
from aspell, pspell, OpenOffice, kword, or whatever.
--
Nadav Har'El | Sunday, Dec 15 2002, 10 Tevet 5763
[EMAIL PROTECTED] |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |I planted some bird seed. A bird came up.
http://nadav.harel.org.il |Now I don't know what to feed it...
=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]