Hi Marcin, Janis, Eleanora,
I did some debugging in the hunspell code, and found that the size of
the Hebrew dictionaries was the cause of the delay, similar to Janis's
problem in Latvian. The files are read line by line, and he_IL.dic has
329,326 entries, which is far more than the other dictionies I tried.
The main bottleneck was not in reading the files from the disk, but in
building the hash tables in hashmgr.cxx in add_word(). When I shortened
he_IL.dic to the size of the Hungarian dictionary, it took the same
amount of time to load Hebrew and Hungarian. Same with Hebrew and
English US.
To Hunspell developers out there: is there any way to make the building
of the hash tables more efficient?
Alan
Marcin Miłkowski wrote:
Hi Alan,
I don't think it's the reason. Polish dictionary file is about 4 MB
and it loads fast (however the affix file is about 200K). Check it
yourself. However, it's not UTF-8 - it's ISO-8859-2. Maybe UTF-8 makes
it slower?
Regards,
Marcin
Alan Yaniger napisał(a):
Hi Daniel,
Thanks for your reply. I downloaded Hunspell and checked a very small
text with Hebrew dictionaries. There was a considerable delay until
hunspell exited. When I checked the same Hebrew text or a similarly
small English text using English dictionaries, hunspell exited
immediately.
Could the size of the dictionaries be the reason for the delay? Here
are the sizes of the Hebrew and English dictionaries:
386,182 he_IL.aff
3,103,184 he_IL.dic
696,131 en_US.dic
3,045 en_US.aff
Alan
Daniel Naber wrote:
On Friday 27 April 2007 10:18, Alan Yaniger wrote:
Is the problem in the the way the dictionaries were
created?
I suggest you download hunspell and use it to check a very small
text. This way you can see if the problem is in OOo or in the
spellchecker component (hunspell). You could also compare with
myspell. As hunspell has more features than myspell, it might be
slower.
Regards
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]