Re: [darcs-users] darcs patch: Refactor Repository.Repair.replayReposit... (and 2 more)

Petr Rockai Sun, 09 Nov 2008 14:01:51 -0800

Hi,

Eric Kow <[EMAIL PROTECTED]> writes:
> On Thu, Nov 06, 2008 at 12:58:22 +0100, Petr Rockai wrote:
>>   The way I have got the recursion versus syncing slurpies backwards is
>>   actually pretty embarassing.
>
> I didn't understand this remark btw...
Oh, that means:


>> -            let j = if ((i::Int) + 1 < 100) then i + 1 else 0
>> -            (ps', s'', restok) <- aaf j s' ps
>> -            s''' <- if j == 0 then update_slurpy r c opts s''
>> -                      else return s''

this code in here would first apply all patches, since the aaf recurses at line
3. Then, when we start unwinding from the recursion, for every 100 *returns*,
we call update_slurpy (line 4) once, writing out the same final slurpy over and
over again. So this was not saving any memory, and did ridiculous amounts of
extra work.

>> -applyAndFix :: [...snip...] -> IO (FL (PatchInfoAnd p), Slurpy)
>> +applyAndFix :: [...snip...] -> IO ((FL (PatchInfoAnd p)), Slurpy, Bool)
>
> Those extra parentheses are superflous, right?  Nit :-)
Indeed.

>> hunk ./src/Darcs/Repository/Repair.hs 80
>> -data RepositoryConsistency = RepositoryConsistent | RepositoryInconsistent 
>> Slurpy
>> -data CanRepair = CanRepair | CannotRepair deriving Eq
>> +data RepositoryConsistency p =
>> +    RepositoryConsistent
>> +  | BrokenPristine Slurpy
>> +  | BrokenPatches Slurpy (PatchSet p)
>
> I might have named this BrokenPatchesAndOrPristine, but I guess that
> would have been silly
BrokenPatchesAndMaybePristine would be more accurate... I was wondering how
much sense it makes to distinguish the case where there are broken patches but
pristine is fine, but it's only UI matter and I guess not worth it.

>> -replayRepository :: (RepoPatch p) => CanRepair -> Repository p -> 
>> [DarcsFlag] -> IO RepositoryConsistency
>> -replayRepository canrepair repo opts = do
>> +replayRepository :: (RepoPatch p) => Repository p -> [DarcsFlag] -> IO 
>> (RepositoryConsistency p)
>> +replayRepository repo opts = do
>
> So as I understand it, the idea is that you move the writing of the
> revised patches out into the repair command (and just pass up the
> new state), which seems like a much cleaner thing to do
Indeed.

>> +  -- TODO is the latter condition needed? Does a broken patch imply pristine
>> +  -- difference? Why, or why not?
>> +  return (if is_same && patches_ok
>> +     then RepositoryConsistent
>> +     else if patches_ok
>> +            then BrokenPristine s'
>> +            else BrokenPatches s' newpatches)
>
> Maybe it's worthwhile to write this as
>
>  if patches_ok
>     then if is_same
>          then RepositoryConsistent
>          else BrokenPristine s'
>     else BrokenPatches s' newpatches
>
> which is simpler (but maybe not as nice logically)
Dunno. I find the original version easier on my brain. Or no. Don't know. It's
just running a little too late.

> Introduce syncSlurpy, that syncs slurpy to disk as needed.
> ----------------------------------------------------------
> I think we should try to haddock at least the functions we export, and
> that your comment below would serve just fine:
>
>>  syncSlurpy takes an IO action that does the actual syncing, but only calls 
>> it
>>  when neccessary to prevent memory blowups.  It is fairly cheap to call often
>>  (ie. after every applied patch).
Good point. I'll amend if possible, or record a new patch the next time I get
around to darcs (hopefully early next week).

>> +unsyncedSlurpySize :: Slurpy -> Int
>> +unsyncedSlurpySize (Slurpy _ (SlurpFile (_,_,size) ps))
>> +    | size == undefined_size = B.length ps
>> +    | otherwise = 0
>> +unsyncedSlurpySize (Slurpy _ (SlurpDir _ ss)) =
>> +    sum $ map unsyncedSlurpySize (map_to_slurpies ss)
>
> Ok, so from here I get the understanding that we know if a slurpy is synced if
> we know its size.  So this function adds up all the bits of slurpy for which 
> we
> do not yet know the size.
In a way, yes. I have derived this bit of knowledge from how the
WriteableDirectory instance for Slurpy works, finding that it sets size to
undefined (which should never happen with HashedIO otherwise, these days). So
we add up all lengths of such SlurpFiles to get a good estimate on how much
memory we are using up with intermediaries.

>> +slurp_sync_size :: Int
>> +slurp_sync_size = 100 * 1000000
>> +
>> +syncSlurpy :: (Slurpy -> IO Slurpy) -> Slurpy -> IO Slurpy
>> +syncSlurpy put s = if (unsyncedSlurpySize s > slurp_sync_size)
>> +                       then do
>> +                          s' <- put s
>> +                          return s'
>> +                       else do
>> +                          return s
>
> Silly Haskell question: should this be simplified to the below?
>
> syncSlurpy put s = if unsyncedSlurpySize s > slurp_sync_size
>                       then put s
>                       else return s
Oh, indeed. That was probably just sloppy thinking on my part.

> Fix Repair.applyAndFix.
> -----------------------
>>  applyAndFix c opts s_ r psin =
>>      do beginTedious k
>>         tediousSize k $ lengthFL psin
>> -       ps <- aaf 0 s_ psin
>> +       ps <- aaf s_ psin
>>         endTedious k
>>         return ps
>>      where k = "Repairing patch" -- FIXME
>> hunk ./src/Darcs/Repository/Repair.hs 65
>> +          aaf s NilFL = return (NilFL, s, True)
>> +          aaf s (p:>:ps) = do
>
>> -            let j = if ((i::Int) + 1 < 100) then i + 1 else 0
>> -            (ps', s'', restok) <- aaf j s' ps
>> -            s''' <- if j == 0 then update_slurpy r c opts s''
>> -                      else return s''
>
>> +            s'' <- syncSlurpy (update_slurpy r c opts) s'
>> +            (ps', s''', restok) <- aaf s'' ps
>
> Hmm, so rather than syncing every at 100 MB (or whatever syncSlurpy
> decides) instead of every 100 patches?  Is that the idea?
Right, and this also flips the sync versus recurse mismatch, so we now recurse
*after* having possibly dumped our intermediate state onto disk.

Just a note though, I have done some benchmarking, and even with this bundle
applied, darcs repair and check take over half gigabyte of memory on GHC repo,
which is still a regression compared to non-hashed repair (which should behave
roughly like darcs 2.0.2's check, memory-wise). It's also quite a mystery to
me, since I have checked that syncSlurpy does the right thing. It might be just
GHC's GC being lazy since the machine has plenty of RAM, but I understand zilch
about that. I haven't yet succeeded at producing a working profiled executable,
for some reason. After that happens, I'll do some profiling homework and
hopefully fix this memory blowup. (However note, that compared to 2.1, this
patch should save *some* memory.)

Yours,
   Petr.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs patch: Refactor Repository.Repair.replayReposit... (and 2 more)

Reply via email to