At Wed, 4 Sep 2013 12:05:47 -0400, Sam Tobin-Hochstadt wrote: > That's very nice -- and faster, to boot. One thing I'm confused about > -- why is the `parameterize` around the second call to > `directory-list` needed? Does it query `current-directory` > internally?
Yes: `(car l)` will be a path that's mean to be relative to the initial directory, and the result paths should be similarly relative. I now see that the "always a complete path" comment below is wrong, and the code generally doesn't work right if `current-directory` changes between the time that `in-directory` is called and the sequence is started. Here's a corrected version (still not tested enough), followed by a version that I like more (also not tested enough). (define (in-directory5 [orig-dir #f]) (define init-dir (current-directory)) ;; current state of the sequence is a list of paths to produce; when ;; incrementing past a directory, add the directory's immediate ;; content to the front of the list: (define (next l) (if (directory-exists? (car l)) (append (parameterize ([current-directory init-dir]) (directory-list (car l) #:build? #t)) (cdr l)) (cdr l))) (make-do-sequence (lambda () (values car next (parameterize ([current-directory init-dir]) (if orig-dir (directory-list orig-dir #:build? #t) (directory-list))) pair? #f #f)))) (define (in-directory6 [orig-dir #f]) (define init-dir (current-directory)) ;; current state of the sequence is a list of paths to produce; when ;; incrementing past a directory, add the directory's immediate ;; content to the front of the list: (define (next l) (if (directory-exists? (car l)) (append (let ([d (car l)]) (for/list ([f (in-list (directory-list (path->complete-path d init-dir)))]) (build-path d f))) (cdr l)) (cdr l))) (make-do-sequence (lambda () (values car next (if orig-dir (next (list orig-dir)) (directory-list init-dir)) pair? #f #f)))) > Oh, and a `define-sequence-syntax` version is even a little faster. > I'll push all of this soon. > > Sam > > On Wed, Sep 4, 2013 at 11:03 AM, Matthew Flatt <mfl...@cs.utah.edu> wrote: > > When continuations are too expensive, consider representing the > > continuation explicitly. In this case, I think a list of paths > > represents the continuation easily enough. > > > > (define (in-directory5 [orig-dir #f]) > > (define init-dir > > (or orig-dir (current-directory))) > > ;; current state of the sequence is a list of paths to produce; when > > ;; incrementing past a directory, add the directory's immediate > > ;; content to the front of the list: > > (define (next l) > > (if (directory-exists? (car l)) > > (append (if orig-dir > > ;; paths are always complete: > > (directory-list (car l) #:build? #t) > > ;; work relative to `orig-dir`: > > (parameterize ([current-directory init-dir]) > > (directory-list (car l) #:build? #t))) > > (cdr l)) > > (cdr l))) > > (make-do-sequence > > (lambda () > > (values > > car > > next > > (if orig-dir > > (directory-list orig-dir #:build? #t) > > (directory-list)) > > pair? > > #f > > #f)))) > > > > At Wed, 4 Sep 2013 10:01:03 -0400, Sam Tobin-Hochstadt wrote: > >> Inspired by a post about a faster directory iteration in Haskell [1], > >> I decided to try doing the same for Racket. The results are here: > >> https://gist.github.com/samth/6437192 > >> > >> The current implementation uses continuations, which are pretty slow. > >> The fastest solution would fuse the traversal and processing code, but > >> Racket's optimizer isn't there yet for any implementation I can come > >> up with. I tried a few different strategies: > >> > >> 1. Build a list of all of the entries, and then just return that. > >> This is quite fast (3x faster than `in-directory), even for 30k+ > >> entries, but it does excessive IO when it's not needed. > >> 2. Use a thread and a channel to communicate. This is about twice as > >> slow (on my SSD) as the list approach, but it doesn't do any excessive > >> IO. > >> 3. Use a thread, a channel, and a vector as a buffer. This can be as > >> fast as or faster than the list approach with a large buffer size > >> (1000 elements), but it's quite competitive even with a buffer size of > >> 5. > >> > >> My channel implementation reads one entry ahead of where the current > >> `in-directory` is. > >> > >> If we think it's important never to do _any_ extra I/O, then I think > >> we should use a modified version of my 2nd implementation. If we're > >> willing to relax that, we should probably go with the 3rd approach > >> with a buffer size of 5 or 10. > >> > >> Note that even my fastest code is 2.5x slower than Unix `find`, even > >> when discounting the Racket startup overhead. > >> > >> Sam > >> > >> [1] https://github.com/JohnLato/posix-paths > >> _________________________ > >> Racket Developers list: > >> http://lists.racket-lang.org/dev _________________________ Racket Developers list: http://lists.racket-lang.org/dev