On Sun, Feb 26, 2012 at 3:00 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser > <warren.weckes...@enthought.com> wrote: > > Right, I got that. Sorry if the placement of the notes about how to > clear > > the cache seemed to imply otherwise. > > OK, cool, np. > > >> Clearing the disk cache is very important for getting meaningful, > >> repeatable benchmarks in code where you know that the cache will > >> usually be cold and where hitting the disk will have unpredictable > >> effects (i.e., pretty much anything doing random access, like > >> databases, which have complicated locality patterns, you may or may > >> not trigger readahead, etc.). But here we're talking about pure > >> sequential reads, where the disk just goes however fast it goes, and > >> your code can either keep up or not. > >> > >> One minor point where the OS interface could matter: it's good to set > >> up your code so it can use mmap() instead of read(), since this can > >> reduce overhead. read() has to copy the data from the disk into OS > >> memory, and then from OS memory into your process's memory; mmap() > >> skips the second step. > > > > Thanks for the tip. Do you happen to have any sample code that > demonstrates > > this? I'd like to explore this more. > > No, I've never actually run into a situation where I needed it myself, > but I learned the trick from Tridge so I tend to believe it :-). > mmap() is actually a pretty simple interface -- the only thing I'd > watch out for is that you want to mmap() the file in pieces (so as to > avoid VM exhaustion on 32-bit systems), but you want to use pretty big > pieces (because each call to mmap()/munmap() has overhead). So you > might want to use chunks in the 32-128 MiB range. Or since I guess > you're probably developing on a 64-bit system you can just be lazy and > mmap the whole file for initial testing. git uses mmap, but I'm not > sure it's very useful example code. > > Also it's not going to do magic. Your code has to be fairly quick > before avoiding a single memcpy() will be noticeable. > > HTH, > Yes, thanks! I'm working on a mmap version now. I'm very curious to see just how much of an improvement it can give. Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion