Hi Kevin,

Thanks for your input. There is a count of the number of entries on the top line of the Hebrew dictionary, so that's not a problem.

On the machine I'm working on now, the OOo installation doesn't have "check all langages" marked.

There's plenty of memory, as the following output of "free" shows:

            total       used       free     shared    buffers     cached
Mem:       8109956     981068    7128888          0      88780     710764
-/+ buffers/cache:     181524    7928432
Swap:      5815488          0    5815488

The installed dictionaries are: English US, Hebrew. If I type in English, there is no noticeable delay, and misspelled words are marked in red. If I then start typing in Hebrew, there is a 5 second delay in which OOo seems "stuck" while building the hash table.

Thanks,
Alan

Kevin B. Hendricks wrote:

Hi Alan,

If you did place the count as the top line (to create a properly sized hash table) then perhaps the only potential speedup is to change hunspell to mmap a file that is the previously created hashtable similar to what ispell uses.

The problem only real problem is that all binary formats like that have endian issues across architectures that make things quite difficult. That is why I decided with myspell to go with building the hash table on-the-fly so to speak. There are no binary compatibility issues that way.

Another source of delay when starting up the spell-checker is when the user has checked "check word in all languages" option but doesn't realize that that they have a large number of dictionaries that have to be loaded when the first misspelt word is checked.

Obviously, for creating hash tables from large .dic files, available memory is an issue. How much memory do you have available for your machine?

Kevin


On May 1, 2007, at 1:08 PM, Alan Yaniger wrote:

Eleonora,

Yes, I used a different dictionary than yours. The hu_HU.dic I used has 96,461 lines. Apparently the Hungarian dictionary available through DicOO isn't the latest.

Perhaps your hardware is faster than mine. In my slower(?) hardware, I see a significant difference between building the hash table for large dictionaries and for smaller ones. Many users have complained about OOo "getting stuck" while the dictionaries load. So I think that it would be useful if Hunspell developers could improve performance here.

Alan

ge wrote:

Alan,

The size of the 2-nd Hungarian dictionary is:

  lines    words    characters
  22068   124931   622546 hu_HU.aff
 873355   873348 26481165 hu_HU.dic
 895423   998279 27103711 total

dic contains 873378 words, it is 8 times larger than Hebrew.
aff is roughly twice as big as Hebrew.

I assume, you used the 1-st Hungarian one, with the small word count for your test.

I use the 2-nd all the time, and it loads in
less than 1 second for me.
Therefore I do not understand the effect you
describe.

-eleonora



Hi Marcin, Janis, Eleanora,

I did some debugging in the hunspell code, and found that the  size of
the Hebrew dictionaries was the cause of the delay, similar to Janis's problem in Latvian. The files are read line by line, and he_IL.dic has
329,326 entries, which is far more than the other dictionies I  tried.
The main bottleneck was not in reading the files from the disk, but in building the hash tables in hashmgr.cxx in add_word(). When I shortened
he_IL.dic to the size of the Hungarian dictionary, it took the same
amount of time to load Hebrew and Hungarian. Same with Hebrew and
English US.

To Hunspell developers out there: is there any way to make the building
of the hash tables more efficient?

Alan



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev- [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev- [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to