Re: [lingu-dev] Slow dictionary load

ge Tue, 01 May 2007 07:34:30 -0700

Alan,

The size of the 2-nd Hungarian dictionary is:


   lines    words    characters
   22068   124931   622546 hu_HU.aff
  873355   873348 26481165 hu_HU.dic
  895423   998279 27103711 total

dic contains 873378 words, it is 8 times larger than Hebrew.
aff is roughly twice as big as Hebrew.

I assume, you used the 1-st Hungarian one, 
with the small word count for your test.

I use the 2-nd all the time, and it loads in
less than 1 second for me.
Therefore I do not understand the effect you
describe.

-eleonora


> Hi Marcin, Janis, Eleanora,
>
> I did some debugging in the hunspell code, and found that the size of
> the Hebrew dictionaries was the cause of the delay, similar to Janis's
> problem in Latvian. The files are read line by line, and he_IL.dic has
> 329,326 entries, which is far more than the other dictionies I tried.
> The main bottleneck was not in reading the files from the disk, but in
> building the hash tables in hashmgr.cxx in add_word(). When I shortened
> he_IL.dic to the size of the Hungarian dictionary, it took the same
> amount of time to load Hebrew and Hungarian. Same with Hebrew and
> English US.
>
> To Hunspell developers out there: is there any way to make the building
> of the hash tables more efficient?
>
> Alan


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] Slow dictionary load

Reply via email to