On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser
> <warren.weckes...@enthought.com> wrote:
> > I haven't pushed it to the extreme, but the "big" example (in the
> examples/
> > directory) is a 1 gig text file with 2 million rows and 50 fields in each
> > row.  This is read in less than 30 seconds (but that's with a solid state
> > drive).
>
> Obviously this was just a quick test, but FYI, a solid state drive
> shouldn't really make any difference here -- this is a pure sequential
> read, and for those, SSDs are if anything actually slower than
> traditional spinning-platter drives.
>
>

Good point.



> For this kind of benchmarking, you'd really rather be measuring the
> CPU time, or reading byte streams that are already in memory. If you
> can process more MB/s than the drive can provide, then your code is
> effectively perfectly fast. Looking at this number has a few
> advantages:
>  - You get more repeatable measurements (no disk buffers and stuff
> messing with you)
>  - If your code can go faster than your drive, then the drive won't
> make your benchmark look bad
>  - There are probably users out there that have faster drives than you
> (e.g., I just measured ~340 megabytes/s off our lab's main RAID
> array), so it's nice to be able to measure optimizations even after
> they stop mattering on your equipment.
>
>

For anyone benchmarking software like this, be sure to clear the disk cache
before each run.  In linux:

$ sync
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"

In Mac OSX:

$ purge

I'm not sure what the equivalent is in Windows.

Warren
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to