[racket-dev] Speeding up `in-directory`

Sam Tobin-Hochstadt Wed, 04 Sep 2013 10:52:43 -0700

Inspired by a post about a faster directory iteration in Haskell [1],
I decided to try doing the same for Racket.  The results are here:
https://gist.github.com/samth/6437192


The current implementation uses continuations, which are pretty slow.
The fastest solution would fuse the traversal and processing code, but
Racket's optimizer isn't there yet for any implementation I can come
up with.  I tried a few different strategies:

1. Build a list of all of the entries, and then just return that.
This is quite fast (3x faster than `in-directory), even for 30k+
entries, but it does excessive IO when it's not needed.
2. Use a thread and a channel to communicate.  This is about twice as
slow (on my SSD) as the list approach, but it doesn't do any excessive
IO.
3. Use a thread, a channel, and a vector as a buffer.  This can be as
fast as or faster than the list approach with a large buffer size
(1000 elements), but it's quite competitive even with a buffer size of
5.

My channel implementation reads one entry ahead of where the current
`in-directory` is.

If we think it's important never to do _any_ extra I/O, then I think
we should use a modified version of my 2nd implementation. If we're
willing to relax that, we should probably go with the 3rd approach
with a buffer size of 5 or 10.

Note that even my fastest code is 2.5x slower than Unix `find`, even
when discounting the Racket startup overhead.

Sam

[1] https://github.com/JohnLato/posix-paths
_________________________
  Racket Developers list:
  http://lists.racket-lang.org/dev

[racket-dev] Speeding up `in-directory`

Reply via email to