On Wed, 21 Jun 2017 23:44:26 +0200 [email protected] (Ludovic Courtès) wrote: > Hello, > > Christopher Baines <[email protected]> skribis: > > > +(define (create-directory-tree files) > > + (define (directory-lists->tree directory-lists) > > + (map (lambda (top-level-dir) > > + (cons top-level-dir > > + (directory-lists->tree > > + (filter-map > > + (lambda (directory-list) > > + (if (eq? (length directory-list) 1) > > + #f > > + (cdr directory-list))) > > + ;; Find all the directory lists under this > > top-level-dir > > + (filter > > + (lambda (directory-list) > > + (equal? (car directory-list) > > + top-level-dir)) > > + directory-lists))))) > > + (delete-duplicates > > + (map car directory-lists)))) > > + > > + (directory-lists->tree > > + (filter-map (lambda (path) > > + (let ((split-path (string-split path #\/))) > > + ;; If this is a file in the top of the > > repository? > > + (if (eq? (length split-path) 1) > > + #f > > + ;; drop-right to remove the filename, as > > it's > > + ;; just the directory tree that's important > > + (drop-right (string-split path #\/) 1)))) > > + files))) > > + > > +(define (directory-in-tree? directory whole-tree) > > + (define directory-list-in-tree? > > + (match-lambda* > > + (((top-directory . rest) tree) > > + (if (null? rest) > > + (list? (member top-directory (map car tree))) > > + (and=> (find (match-lambda > > + ((subtree-top-directory . subtree) > > + (equal? subtree-top-directory > > + top-directory))) > > + tree) > > + (match-lambda > > + ((subtree-top-directory . subtree) > > + (directory-list-in-tree? rest subtree)))))))) > > + > > + (directory-list-in-tree? (string-split directory #\/) > > + whole-tree)) > > I tried it to fully understand what was going on. While playing with > it I came up with a variant that is slightly more concise and clearer > (at least to me ;-)): > > --8<---------------cut here---------------start------------->8--- > (define (files->directory-tree files) > "Return a tree of vhashes representing the directory listed in > FILES, a list like '(\"a/b\" \"b/c/d\")." > (fold (lambda (file result) > (let loop ((file (string-split file #\/)) > (result result)) > (match file > ((_) > result) > ((directory children ...) > (match (vhash-assoc directory result) > (#f > (vhash-cons directory (loop children vlist-null) > result)) > ((_ . previous) > ;; XXX: 'vhash-delete' is O(n). > (vhash-cons directory (loop children previous) > (vhash-delete directory result))))) > (() > result)))) > vlist-null > files)) > > (define (directory-in-tree? tree directory) > "Return true if DIRECTORY, a string like \"a/b\", denotes a > directory listed in TREE." > (let loop ((directory (string-split directory #\/)) > (tree tree)) > (match directory > (() > #t) > ((head . tail) > (match (vhash-assoc head tree) > ((_ . sub-tree) (loop tail sub-tree)) > (#f #f)))))) > --8<---------------cut here---------------end--------------->8---
Thanks for looking at this Ludo, and apologies for taking so long to
reply. These functions are definately more concise.
> The tree is a tree of vhash, which should make lookup (i.e.,
> ‘directory-in-tree?’) slightly faster. So it works like this:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (files->directory-tree '("a" "a/b" "a/b/x" "a/c"
> "b" "b/c")) $17 = #<vhash 2802a40 2 pairs>
> scheme@(guile-user)> (directory-in-tree? $17 "a/b")
> $18 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a")
> $19 = #t
> scheme@(guile-user)> (directory-in-tree? $17 "a/b/x")
> $21 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "a/c")
> $22 = #f
> scheme@(guile-user)> (directory-in-tree? $17 "b")
> $23 = #t
> --8<---------------cut here---------------end--------------->8---
>
> WDYT? If you like it, take it and adjust as you see fit. We’re doing
> 4-hand coding like whichever fancy methodology recommends. ;-)
I've had a little look at the performance, and I think this approach of
using a vhash might be slower, but not by much. The biggest difference
I saw for the guix repository was 0.008 seconds, and for smart-answers
was 0.6 seconds.
I'm happy to go ahead with either though, obviously these performance
differences don't matter compared to the current performance for large
repositories.
pgpEzFTcfCH_x.pgp
Description: OpenPGP digital signature
