That's very nice -- and faster, to boot. One thing I'm confused about -- why is the `parameterize` around the second call to `directory-list` needed? Does it query `current-directory` internally?
Oh, and a `define-sequence-syntax` version is even a little faster. I'll push all of this soon. Sam On Wed, Sep 4, 2013 at 11:03 AM, Matthew Flatt <mfl...@cs.utah.edu> wrote: > When continuations are too expensive, consider representing the > continuation explicitly. In this case, I think a list of paths > represents the continuation easily enough. > > (define (in-directory5 [orig-dir #f]) > (define init-dir > (or orig-dir (current-directory))) > ;; current state of the sequence is a list of paths to produce; when > ;; incrementing past a directory, add the directory's immediate > ;; content to the front of the list: > (define (next l) > (if (directory-exists? (car l)) > (append (if orig-dir > ;; paths are always complete: > (directory-list (car l) #:build? #t) > ;; work relative to `orig-dir`: > (parameterize ([current-directory init-dir]) > (directory-list (car l) #:build? #t))) > (cdr l)) > (cdr l))) > (make-do-sequence > (lambda () > (values > car > next > (if orig-dir > (directory-list orig-dir #:build? #t) > (directory-list)) > pair? > #f > #f)))) > > At Wed, 4 Sep 2013 10:01:03 -0400, Sam Tobin-Hochstadt wrote: >> Inspired by a post about a faster directory iteration in Haskell [1], >> I decided to try doing the same for Racket. The results are here: >> https://gist.github.com/samth/6437192 >> >> The current implementation uses continuations, which are pretty slow. >> The fastest solution would fuse the traversal and processing code, but >> Racket's optimizer isn't there yet for any implementation I can come >> up with. I tried a few different strategies: >> >> 1. Build a list of all of the entries, and then just return that. >> This is quite fast (3x faster than `in-directory), even for 30k+ >> entries, but it does excessive IO when it's not needed. >> 2. Use a thread and a channel to communicate. This is about twice as >> slow (on my SSD) as the list approach, but it doesn't do any excessive >> IO. >> 3. Use a thread, a channel, and a vector as a buffer. This can be as >> fast as or faster than the list approach with a large buffer size >> (1000 elements), but it's quite competitive even with a buffer size of >> 5. >> >> My channel implementation reads one entry ahead of where the current >> `in-directory` is. >> >> If we think it's important never to do _any_ extra I/O, then I think >> we should use a modified version of my 2nd implementation. If we're >> willing to relax that, we should probably go with the 3rd approach >> with a buffer size of 5 or 10. >> >> Note that even my fastest code is 2.5x slower than Unix `find`, even >> when discounting the Racket startup overhead. >> >> Sam >> >> [1] https://github.com/JohnLato/posix-paths >> _________________________ >> Racket Developers list: >> http://lists.racket-lang.org/dev _________________________ Racket Developers list: http://lists.racket-lang.org/dev