On Mon, Feb 24, 2003, Eli Marmor wrote about "Re: Important News": > hspell will have to wait... > (in any case, hspell is still in the process of being ported to C, and > as long as it isn't ready, I can't integrate it).
Note that a version of the hspell front-end in C can be probably ready in less than 2 weeks, if I shift my focus to it (or if someone else does it). The major algorithms are ready (with a 50-fold decrease in start-up time and 6-fold decrease in memory use). The most complicated thing I'm sitting on now is how to specify which words can sensibly take which prefixes (this was a feature missing in the Perl version, and I don't want the C version not to have it; If push comes to shove, I can write a C version without this feature). Anyway, our current focus is on expanding the dictionary. This is very important because if we assume that words are distributed in a power-law distribution (and I think this assumption is close to being true), then doubling the number of words in the dictionary *squares* the chance of false-negative (not recognizing correct words). So if release 0.1 had 120,000 inflections, and in a typical document (based on some experiment I did) 10% of the different correct words in the document were not recognized, then doubling the number of words to 240,000 (we're nearing that mark) should hopefully reduce the false-negatives to 1%. Please tell me when you are getting serious about integrating hspell with OpenOffice, and I *will* shift my focus to finishing the C interface. Anyway, I guess that you'll some some Open-Office-specific work to do to support a Hebrew spellchecker (any spellchecker), so you don't have to wait for my side of the work to be finished before you start yours. -- Nadav Har'El | Monday, Feb 24 2003, 22 Adar I 5763 [EMAIL PROTECTED] |----------------------------------------- Phone: +972-53-245868, ICQ 13349191 |Ways to Relieve Stress #10: Make up a http://nadav.harel.org.il |language and ask people for directions.
