[chromium-dev] Re: Spellchecker and memory-mapped dicts

Steve Vandebogart Thu, 22 Oct 2009 15:11:19 -0700

That is the intention of the interface yes, but all Linux implementations
I've seen actually go and read what ever you say you will need.  Of course
with a few exceptions like actually being out of memory.
--
Steve


On Thu, Oct 22, 2009 at 3:06 PM, Scott Hess <[email protected]> wrote:

> WILLNEED says "Hey, OS, I think I'm going to look at these pages soon,
> get yourself ready", but the OS could implement them as a nop, and can
> do it async.  If memory is under pressure the system can do less, if
> memory is clear it can do more.  Actually reading the data into memory
> blocks and actually reads them into memory.
>
> -scott
>
>
> On Thu, Oct 22, 2009 at 3:01 PM, Steve Vandebogart <[email protected]>
> wrote:
> > Probably a bit off topic at this point, but but your response confuses me
> > - MADV_WILLNEED and POSIX_FADV_WILLNEED will bring the pages into ram,
> just
> > like faulting in mmap()'ed pages by hand, or read()ing it into memory.
>  In
> > my experience, read() and fadvise() are faster than mmap()+faulting
> > everything in, or madvise().  Of course, read()ing it in means it has to
> be
> > swapped out and can't just be dropped.
> > If you want to suck the entire file in at some point, probably the best
> way
> > is to fadvise() it in, then mmap() it and use it from there.
> > --
> > Steve
> > On Thu, Oct 22, 2009 at 2:52 PM, Scott Hess <[email protected]> wrote:
> >>
> >> Faulting it in by hand is only helpful if we're right!  If we're
> >> wrong, it could evict other stuff from memory to support a feature
> >> which a user may not use until the memory is faulted back out anyhow.
> >>
> >> [From the rest of the thread, though, it sounds like maybe we should
> >> just fix hunspell to be more efficient for our usage.]
> >>
> >> -scott
> >>
> >>
> >> On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart <
> [email protected]>
> >> wrote:
> >> > It's been awhile since I looked at this, but the email I was able to
> dig
> >> > up
> >> > suggests that madvise is no faster than faulting in the mmap()ed
> region
> >> > by
> >> > hand.  However, using posix_fadvise should give the same speeds as
> >> > read()ing
> >> > it into memory.  IIRC though, posix_fadvise will only read so much in
> a
> >> > single request and will let readahead handle the rest after that.
> >> > --
> >> > Steve
> >> >
> >> > On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess <[email protected]>
> wrote:
> >> >>
> >> >> On Linux what about mmap() and then madvise() with MADV_WILLNEED?
>  [or
> >> >> posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).
> >> >>
> >> >> -scott
> >> >>
> >> >>
> >> >> On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart
> >> >> <[email protected]>
> >> >> wrote:
> >> >> > If you plan to read the entire file, mmap()ing it, then faulting it
> >> >> > in
> >> >> > will
> >> >> > be slower than read()ing it, at least in some Linux versions.  I
> >> >> > never
> >> >> > pinned down exactly why, but I think the kernel read-ahead
> mechanism
> >> >> > works
> >> >> > slightly differently.
> >> >> > --
> >> >> > Steve
> >> >> >
> >> >> > On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans <[email protected]>
> >> >> > wrote:
> >> >> >>
> >> >> >> There's also option 3)
> >> >> >> Pre-fault the mmap()ed region in the file thread upon dictionary
> >> >> >> initialization.
> >> >> >> On Linux at least, that may give you better behaviour than
> malloc()
> >> >> >> +
> >> >> >> read() in the event of memory pressure.
> >> >> >> Cheers
> >> >> >> Chris
> >> >> >>
> >> >> >> On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade <[email protected]>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> Hi all,
> >> >> >>>
> >> >> >>> At its last meeting the jank task force discussed improving
> >> >> >>> responsiveness of the spellchecker but we didn't come to a solid
> >> >> >>> conclusion so I thought I'd bring it up here to see if anyone
> else
> >> >> >>> has
> >> >> >>> opinions. The main concern is that we don't block the IO thread
> on
> >> >> >>> file access. To this end, I recently moved initialization of the
> >> >> >>> spellchecker from the IO thread to the file thread. However,
> >> >> >>> instead
> >> >> >>> of reading in the spellchecker dictionary in one solid chunk, we
> >> >> >>> memory-map it. Then later we check individual words on the IO
> >> >> >>> thread,
> >> >> >>> which will be slow since the dictionary starts off effectively
> >> >> >>> completely paged out. The proposal is that we read in the
> >> >> >>> dictionary
> >> >> >>> at spellchecker intialization instead of memory mapping it.
> >> >> >>>
> >> >> >>> Memory mapping pros:
> >> >> >>> - possibly uses less overall memory, depending on the structure
> of
> >> >> >>> the
> >> >> >>> dictionary and the usage pattern of the user.
> >> >> >>> - <strike>loading the dictionary doesn't block for a long
> >> >> >>> time</strike> this one no longer occurs either way due to my
> recent
> >> >> >>> refactoring
> >> >> >>>
> >> >> >>> Reading it all at once pros:
> >> >> >>> - costly disk accesses are kept to the file thread (excepting
> >> >> >>> future
> >> >> >>> memory paging)
> >> >> >>> - overall disk access time is probably lower (since we can read
> in
> >> >> >>> the
> >> >> >>> dict in one chunk)
> >> >> >>>
> >> >> >>> For reference, the English dictionary is about 500K, and most
> >> >> >>> dictionaries are under 2 megs, some (such as Hungarian) are much
> >> >> >>> higher, but no dictionary is over 10 megs.
> >> >> >>>
> >> >> >>> Opinions?
> >> >> >>>
> >> >> >>> -- Evan Stade
> >> >> >>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > > >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: Spellchecker and memory-mapped dicts

Reply via email to