On Sat, Sep 12, 2009 at 17:33:34 +0200, Petr Rockai wrote: > I guess the clean_hashdir patch could be cherrypicked into the mainline, but > that does not fix any outstanding bug. The rest probably waits for the > hashed-storage review.
As requested... Improve performance of clean_hashed (by a factor of ~n/logn). ------------------------------------------------------------- > On the scale of 50k files + 50k garbage, clean_hashes was basically useless > (before: gave up waiting after 20 minutes, after: 35 seconds). (This happens > when running darcs optimize --pristine on a 50k-big repository.) > do -- we'll remove obsolete bits of "dir" > debugMessage $ "Cleaning out " ++ (hashedDir dir_) ++ "..." > let hashdir = darcsdir ++ "/" ++ (hashedDir dir_) ++ "/" > - hs <- concat `fmap` (mapM (listHashedContents "cleaning up..." c) > hashroots) > - fs <- filter okayHash `fmap` getDirectoryContents hashdir > - mapM_ (removeFileMayNotExist . (hashdir++)) (fs \\ hs) > + hs <- set . concat <$> mapM (listHashedContents "cleaning up..." c) > hashroots > + fs <- set . filter okayHash <$> getDirectoryContents hashdir > + mapM_ (removeFileMayNotExist . (hashdir++)) (unset $ fs > `Set.difference` hs) This looks safe for me to apply, so I'm going to go ahead and do it. Aside from a stylistic tweak using Control.Applicative.(<$>) in place of infix fmap, it looks like the all we're really doing is just using the Set.difference operation which I can believe would be a lot faster than doing \\ with really huge sets (hs and fs are implicitly sets) > + where set = Set.fromList . map BC.pack > + unset = map BC.unpack . Set.toList Why do the BC.pack and BC.unpack make a difference? -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
pgpl1WHYw1qaQ.pgp
Description: PGP signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
