Hi, Eric Kow <[EMAIL PROTECTED]> writes: > On Thu, Nov 06, 2008 at 12:58:22 +0100, Petr Rockai wrote: >> The way I have got the recursion versus syncing slurpies backwards is >> actually pretty embarassing. > > I didn't understand this remark btw... Oh, that means:
>> - let j = if ((i::Int) + 1 < 100) then i + 1 else 0 >> - (ps', s'', restok) <- aaf j s' ps >> - s''' <- if j == 0 then update_slurpy r c opts s'' >> - else return s'' this code in here would first apply all patches, since the aaf recurses at line 3. Then, when we start unwinding from the recursion, for every 100 *returns*, we call update_slurpy (line 4) once, writing out the same final slurpy over and over again. So this was not saving any memory, and did ridiculous amounts of extra work. >> -applyAndFix :: [...snip...] -> IO (FL (PatchInfoAnd p), Slurpy) >> +applyAndFix :: [...snip...] -> IO ((FL (PatchInfoAnd p)), Slurpy, Bool) > > Those extra parentheses are superflous, right? Nit :-) Indeed. >> hunk ./src/Darcs/Repository/Repair.hs 80 >> -data RepositoryConsistency = RepositoryConsistent | RepositoryInconsistent >> Slurpy >> -data CanRepair = CanRepair | CannotRepair deriving Eq >> +data RepositoryConsistency p = >> + RepositoryConsistent >> + | BrokenPristine Slurpy >> + | BrokenPatches Slurpy (PatchSet p) > > I might have named this BrokenPatchesAndOrPristine, but I guess that > would have been silly BrokenPatchesAndMaybePristine would be more accurate... I was wondering how much sense it makes to distinguish the case where there are broken patches but pristine is fine, but it's only UI matter and I guess not worth it. >> -replayRepository :: (RepoPatch p) => CanRepair -> Repository p -> >> [DarcsFlag] -> IO RepositoryConsistency >> -replayRepository canrepair repo opts = do >> +replayRepository :: (RepoPatch p) => Repository p -> [DarcsFlag] -> IO >> (RepositoryConsistency p) >> +replayRepository repo opts = do > > So as I understand it, the idea is that you move the writing of the > revised patches out into the repair command (and just pass up the > new state), which seems like a much cleaner thing to do Indeed. >> + -- TODO is the latter condition needed? Does a broken patch imply pristine >> + -- difference? Why, or why not? >> + return (if is_same && patches_ok >> + then RepositoryConsistent >> + else if patches_ok >> + then BrokenPristine s' >> + else BrokenPatches s' newpatches) > > Maybe it's worthwhile to write this as > > if patches_ok > then if is_same > then RepositoryConsistent > else BrokenPristine s' > else BrokenPatches s' newpatches > > which is simpler (but maybe not as nice logically) Dunno. I find the original version easier on my brain. Or no. Don't know. It's just running a little too late. > Introduce syncSlurpy, that syncs slurpy to disk as needed. > ---------------------------------------------------------- > I think we should try to haddock at least the functions we export, and > that your comment below would serve just fine: > >> syncSlurpy takes an IO action that does the actual syncing, but only calls >> it >> when neccessary to prevent memory blowups. It is fairly cheap to call often >> (ie. after every applied patch). Good point. I'll amend if possible, or record a new patch the next time I get around to darcs (hopefully early next week). >> +unsyncedSlurpySize :: Slurpy -> Int >> +unsyncedSlurpySize (Slurpy _ (SlurpFile (_,_,size) ps)) >> + | size == undefined_size = B.length ps >> + | otherwise = 0 >> +unsyncedSlurpySize (Slurpy _ (SlurpDir _ ss)) = >> + sum $ map unsyncedSlurpySize (map_to_slurpies ss) > > Ok, so from here I get the understanding that we know if a slurpy is synced if > we know its size. So this function adds up all the bits of slurpy for which > we > do not yet know the size. In a way, yes. I have derived this bit of knowledge from how the WriteableDirectory instance for Slurpy works, finding that it sets size to undefined (which should never happen with HashedIO otherwise, these days). So we add up all lengths of such SlurpFiles to get a good estimate on how much memory we are using up with intermediaries. >> +slurp_sync_size :: Int >> +slurp_sync_size = 100 * 1000000 >> + >> +syncSlurpy :: (Slurpy -> IO Slurpy) -> Slurpy -> IO Slurpy >> +syncSlurpy put s = if (unsyncedSlurpySize s > slurp_sync_size) >> + then do >> + s' <- put s >> + return s' >> + else do >> + return s > > Silly Haskell question: should this be simplified to the below? > > syncSlurpy put s = if unsyncedSlurpySize s > slurp_sync_size > then put s > else return s Oh, indeed. That was probably just sloppy thinking on my part. > Fix Repair.applyAndFix. > ----------------------- >> applyAndFix c opts s_ r psin = >> do beginTedious k >> tediousSize k $ lengthFL psin >> - ps <- aaf 0 s_ psin >> + ps <- aaf s_ psin >> endTedious k >> return ps >> where k = "Repairing patch" -- FIXME >> hunk ./src/Darcs/Repository/Repair.hs 65 >> + aaf s NilFL = return (NilFL, s, True) >> + aaf s (p:>:ps) = do > >> - let j = if ((i::Int) + 1 < 100) then i + 1 else 0 >> - (ps', s'', restok) <- aaf j s' ps >> - s''' <- if j == 0 then update_slurpy r c opts s'' >> - else return s'' > >> + s'' <- syncSlurpy (update_slurpy r c opts) s' >> + (ps', s''', restok) <- aaf s'' ps > > Hmm, so rather than syncing every at 100 MB (or whatever syncSlurpy > decides) instead of every 100 patches? Is that the idea? Right, and this also flips the sync versus recurse mismatch, so we now recurse *after* having possibly dumped our intermediate state onto disk. Just a note though, I have done some benchmarking, and even with this bundle applied, darcs repair and check take over half gigabyte of memory on GHC repo, which is still a regression compared to non-hashed repair (which should behave roughly like darcs 2.0.2's check, memory-wise). It's also quite a mystery to me, since I have checked that syncSlurpy does the right thing. It might be just GHC's GC being lazy since the machine has plenty of RAM, but I understand zilch about that. I haven't yet succeeded at producing a working profiled executable, for some reason. After that happens, I'll do some profiling homework and hopefully fix this memory blowup. (However note, that compared to 2.1, this patch should save *some* memory.) Yours, Petr. -- Peter Rockai | me()mornfall!net | prockai()redhat!com http://blog.mornfall.net | http://web.mornfall.net "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton on the subject of C program indentation _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
