Nadav Har'El wrote:

>  1. Aspell does not (or at least we didn't figure out how to) support prefixes,
>     so instead of a 125,000 word word list (in this release) we had to multiply
>     this by the number of prefixes (he, shin, etc. - about 20 prefixes in all)
>     and the resulting over-million-word list took ages to load into aspell
>     (hspell is much faster, even when written in Perl!).

If anybody is curious, I learned the following information from Melingo
and/or Prof. Choueka (I don't remember who exactly, but I think both
agree with it):

<off-topic, academic stuff>

If "Tzurot" is the term for the number of all the variations of Hebrew
words, including all the combinations of prefixes AND SUFFIXES (which
Nadav and Dan didn't count), then there are about 70 million Tzurot
in Hebrew.

There are many Tzurot that have never been used. Some of them may
become popular in the future, so it is not a good idea to just harvest
zillion Hebrew texts and corpuses, and build a dictionary of
everything. Such a dictionary will suffer from:

1. Still many tzurot will not be included in it.
2. If it is really big, it will include many mistakes.

</off topic, academic stuff>

-- 
Eli Marmor
[EMAIL PROTECTED]
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.:   +972-9-766-1020          8 Yad-Harutzim St.
Fax.:   +972-9-766-1314          P.O.B. 7004
Mobile: +972-50-23-7338          Kfar-Saba 44641, Israel

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to