Inspired by a post about a faster directory iteration in Haskell [1], I decided to try doing the same for Racket. The results are here: https://gist.github.com/samth/6437192
The current implementation uses continuations, which are pretty slow. The fastest solution would fuse the traversal and processing code, but Racket's optimizer isn't there yet for any implementation I can come up with. I tried a few different strategies: 1. Build a list of all of the entries, and then just return that. This is quite fast (3x faster than `in-directory), even for 30k+ entries, but it does excessive IO when it's not needed. 2. Use a thread and a channel to communicate. This is about twice as slow (on my SSD) as the list approach, but it doesn't do any excessive IO. 3. Use a thread, a channel, and a vector as a buffer. This can be as fast as or faster than the list approach with a large buffer size (1000 elements), but it's quite competitive even with a buffer size of 5. My channel implementation reads one entry ahead of where the current `in-directory` is. If we think it's important never to do _any_ extra I/O, then I think we should use a modified version of my 2nd implementation. If we're willing to relax that, we should probably go with the 3rd approach with a buffer size of 5 or 10. Note that even my fastest code is 2.5x slower than Unix `find`, even when discounting the Racket startup overhead. Sam [1] https://github.com/JohnLato/posix-paths _________________________ Racket Developers list: http://lists.racket-lang.org/dev