On Mon, Mar 10, 2003, Gal Shalif - Sun Israel - Software Engineer wrote about "Re:
Hebrew SpellChecker and OpenOffice/StarOffice SpellChecker engine":
> >AFAIK hspell is a totally original approach.
Gal, if you have Hspell-specific questions you can also email Dan and myself
directly. But *PLEASE* don't send me HTML mail :)
Anyway, Jonathan is right, hspell has nothing to do with the "MySpell
spellchecker engine". In fact an hour ago was the first time I ever heard
of this MySpell. Can you point me to some documentation of what MySpell
is supposed to do (not MySpell binaries or sources)?
> Well, it would make much larger impact on OpenOffice/StarOffice/Mozilla if
> it could be implemented as an extension of the MySpell spellchecker engine.
Let me tell you a bit about what Hspell is.
Hspell has two parts:
1. A system of programs and data files (dictionaries) which creates a
large list of Hebrew words (currently, 250,000). These words include
valid inflections (e.g., kelev => kalbi, kalbecha, klavim, etc.)
but not particles (letters mosh"e vecale"v, specifying 'the', 'from',
'in', etc.).
This database-building phase is run only once, during compilation,
and creates a 1.8MB file which is compressed (using my own compression
algorithm) to a 82K data file, which is the dictionary that gets
installed.
Note that most of the work of the Hspell project went into this part,
of building the dictionaries and the inflection programs.
2. The front-end, which reads the afforementioned data file and goes on
to spell-check the given file.
Part 2 is currently written in Perl, but I already have an initial prototype
written in C with greatly improved performance (both time and space-wise).
I assume the "MySpell" thing replaces #2. What can it do? does it use
a straight word list for the spell-checking? If so, this is not enough for
Hebrew because you need to allow adding the particles (and even worse, not
every combination of particles is legal, and not on every word). Also good
Hebrew spell-checking needs to allow rashei tevot, gimatria, and academy-
specified letter doubling (e.g., vilon has one vav, havilon has two) - and
I doubt MySpell could do this out of the box either.
Or does MySpell call an external program (or shared library) to check each
word? If it doesn't that, #2 is still needed and MySpell isn't such a great
"gluck", or why it wouldn't be trivial for Hspell to support it.
So if you tell me what MySpell does, maybe I can tell you what it would take
to change Hspell and/or MySpell for both of them to cooperate.
> >I suggest that you download it and look at the source. It's not that large
> >or complex.
> I wish that I would have the time :-(
Yeah, time is a problem for all of us...
Which is why I cannot spend days on integration of Hspell to every specific
software in existance (Lyx, Emacs, OpenOffice, Mozilla), and I need help -
at least in telling me what kind of interfaces these programs require.
--
Nadav Har'El | Monday, Mar 10 2003, 7 Adar II 5763
[EMAIL PROTECTED] |-----------------------------------------
Phone: +972-53-245868, ICQ 13349191 |How much wood would a woodchuck chuck if
http://nadav.harel.org.il |a woodchuck would chuck wood?