[chromium-dev] Re: Spellchecker and memory-mapped dicts

Scott Hess Thu, 22 Oct 2009 15:07:14 -0700

WILLNEED says "Hey, OS, I think I'm going to look at these pages soon,
get yourself ready", but the OS could implement them as a nop, and can
do it async.  If memory is under pressure the system can do less, if
memory is clear it can do more.  Actually reading the data into memory
blocks and actually reads them into memory.


-scott


On Thu, Oct 22, 2009 at 3:01 PM, Steve Vandebogart <vand...@chromium.org> wrote:
> Probably a bit off topic at this point, but but your response confuses me
> - MADV_WILLNEED and POSIX_FADV_WILLNEED will bring the pages into ram, just
> like faulting in mmap()'ed pages by hand, or read()ing it into memory.  In
> my experience, read() and fadvise() are faster than mmap()+faulting
> everything in, or madvise().  Of course, read()ing it in means it has to be
> swapped out and can't just be dropped.
> If you want to suck the entire file in at some point, probably the best way
> is to fadvise() it in, then mmap() it and use it from there.
> --
> Steve
> On Thu, Oct 22, 2009 at 2:52 PM, Scott Hess <sh...@chromium.org> wrote:
>>
>> Faulting it in by hand is only helpful if we're right!  If we're
>> wrong, it could evict other stuff from memory to support a feature
>> which a user may not use until the memory is faulted back out anyhow.
>>
>> [From the rest of the thread, though, it sounds like maybe we should
>> just fix hunspell to be more efficient for our usage.]
>>
>> -scott
>>
>>
>> On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart <vand...@chromium.org>
>> wrote:
>> > It's been awhile since I looked at this, but the email I was able to dig
>> > up
>> > suggests that madvise is no faster than faulting in the mmap()ed region
>> > by
>> > hand.  However, using posix_fadvise should give the same speeds as
>> > read()ing
>> > it into memory.  IIRC though, posix_fadvise will only read so much in a
>> > single request and will let readahead handle the rest after that.
>> > --
>> > Steve
>> >
>> > On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess <sh...@chromium.org> wrote:
>> >>
>> >> On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
>> >> posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).
>> >>
>> >> -scott
>> >>
>> >>
>> >> On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart
>> >> <vand...@chromium.org>
>> >> wrote:
>> >> > If you plan to read the entire file, mmap()ing it, then faulting it
>> >> > in
>> >> > will
>> >> > be slower than read()ing it, at least in some Linux versions.  I
>> >> > never
>> >> > pinned down exactly why, but I think the kernel read-ahead mechanism
>> >> > works
>> >> > slightly differently.
>> >> > --
>> >> > Steve
>> >> >
>> >> > On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans <cev...@chromium.org>
>> >> > wrote:
>> >> >>
>> >> >> There's also option 3)
>> >> >> Pre-fault the mmap()ed region in the file thread upon dictionary
>> >> >> initialization.
>> >> >> On Linux at least, that may give you better behaviour than malloc()
>> >> >> +
>> >> >> read() in the event of memory pressure.
>> >> >> Cheers
>> >> >> Chris
>> >> >>
>> >> >> On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade <est...@chromium.org>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi all,
>> >> >>>
>> >> >>> At its last meeting the jank task force discussed improving
>> >> >>> responsiveness of the spellchecker but we didn't come to a solid
>> >> >>> conclusion so I thought I'd bring it up here to see if anyone else
>> >> >>> has
>> >> >>> opinions. The main concern is that we don't block the IO thread on
>> >> >>> file access. To this end, I recently moved initialization of the
>> >> >>> spellchecker from the IO thread to the file thread. However,
>> >> >>> instead
>> >> >>> of reading in the spellchecker dictionary in one solid chunk, we
>> >> >>> memory-map it. Then later we check individual words on the IO
>> >> >>> thread,
>> >> >>> which will be slow since the dictionary starts off effectively
>> >> >>> completely paged out. The proposal is that we read in the
>> >> >>> dictionary
>> >> >>> at spellchecker intialization instead of memory mapping it.
>> >> >>>
>> >> >>> Memory mapping pros:
>> >> >>> - possibly uses less overall memory, depending on the structure of
>> >> >>> the
>> >> >>> dictionary and the usage pattern of the user.
>> >> >>> - <strike>loading the dictionary doesn't block for a long
>> >> >>> time</strike> this one no longer occurs either way due to my recent
>> >> >>> refactoring
>> >> >>>
>> >> >>> Reading it all at once pros:
>> >> >>> - costly disk accesses are kept to the file thread (excepting
>> >> >>> future
>> >> >>> memory paging)
>> >> >>> - overall disk access time is probably lower (since we can read in
>> >> >>> the
>> >> >>> dict in one chunk)
>> >> >>>
>> >> >>> For reference, the English dictionary is about 500K, and most
>> >> >>> dictionaries are under 2 megs, some (such as Hungarian) are much
>> >> >>> higher, but no dictionary is over 10 megs.
>> >> >>>
>> >> >>> Opinions?
>> >> >>>
>> >> >>> -- Evan Stade
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > >> >> >
>> >> >
>> >
>> >
>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

[chromium-dev] Re: Spellchecker and memory-mapped dicts

Reply via email to