[chromium-dev] Re: Spellchecker and memory-mapped dicts

Chris Evans Thu, 22 Oct 2009 14:30:11 -0700

On Thu, Oct 22, 2009 at 2:22 PM, Brett Wilson <[email protected]> wrote:


>
> On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade <[email protected]> wrote:
> >
> > Hi all,
> >
> > At its last meeting the jank task force discussed improving
> > responsiveness of the spellchecker but we didn't come to a solid
> > conclusion so I thought I'd bring it up here to see if anyone else has
> > opinions. The main concern is that we don't block the IO thread on
> > file access. To this end, I recently moved initialization of the
> > spellchecker from the IO thread to the file thread. However, instead
> > of reading in the spellchecker dictionary in one solid chunk, we
> > memory-map it. Then later we check individual words on the IO thread,
> > which will be slow since the dictionary starts off effectively
> > completely paged out. The proposal is that we read in the dictionary
> > at spellchecker intialization instead of memory mapping it.
> >
> > Memory mapping pros:
> > - possibly uses less overall memory, depending on the structure of the
> > dictionary and the usage pattern of the user.
> > - <strike>loading the dictionary doesn't block for a long
> > time</strike> this one no longer occurs either way due to my recent
> > refactoring
> >
> > Reading it all at once pros:
> > - costly disk accesses are kept to the file thread (excepting future
> > memory paging)
> > - overall disk access time is probably lower (since we can read in the
> > dict in one chunk)
> >
> > For reference, the English dictionary is about 500K, and most
> > dictionaries are under 2 megs, some (such as Hungarian) are much
> > higher, but no dictionary is over 10 megs.
> >
> > Opinions?
>
> I've thought about this some (I wrote the memory map thing there now).
>
> History of the spellchecker:
> v1 : Per-process Hunspell storage (lots of memory duplicated in each
> renderer, expensive to load).
> v2 : Browser-process Hunspell storage (lots of memory, expensive to
> load, only occurs once)
> v3 : Browser-process memmap (less memory, cheap to load, only occurs once).
>
> I would like to consider moving hunspell back to the renderer so we
> can avoid sync IPCs and blocking the I/O thread on spellchecking.
>

That would also be a stability win. Currently, any hunspell crashes due to
bust dictionaries take down the entire browser.

Cheers
Chris

Spellchecking isn't fast (especially suggestions) even when everything
> is in memory, so it always sucks to have it block the I/O thread. Now
> that it can be memmapped, each renderer can memmap its own image of
> the data.
>
> This doesn't help on Mac where we want to use the system spellchecker.
> There would also be some amount of duplication since there are certain
> tables that are initialized once at the beginning (I don't think its
> that big, though).
>
> I would suggest first making the current histograms in the
> spellchecker.cc file UMA (currently they're debug-only local ones) so
> we can see how much blocking we're getting from Hunspell in the field.
>
> Brett
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: Spellchecker and memory-mapped dicts

Reply via email to