Re: [darcs-users] darcs patch: Begin using RIO (and 39 more)

Jason Dagit Fri, 12 Sep 2008 13:37:32 -0700

On Fri, Sep 12, 2008 at 10:13 AM, David Roundy <[EMAIL PROTECTED]> wrote:
> Hi Jason,
>
> You've guilted me into responding to your questions, although I'm not
> clear what good it'll do you.

I didn't want to guilt you into anything.  My goal of getting these
reviewed is so that we have consensus on how to solve these problems
so that I can work on patches that actually resolve them.

Thank you very much for looking over these questions.

> On Mon, Sep 08, 2008 at 05:08:20PM -0700, Jason Dagit wrote:
>> Moving on to the next one, still in Changes.lhs.  I added a comment to
>> this one hoping I would get some feedback.  It appears to happen
>> because we have this function:
>> get_changes_info :: RepoPatch p => [DarcsFlag] -> [FilePath] -> PatchSet p 
>> C(x)
>>                 -> ([Sealed2 (PatchInfoAnd p)], [FilePath], Doc)
>>
>> And it gets used like this:
>> +  case mF (:<:NilRL) (headRL (slightly_optimize_patchset r)) of
>> +   FlippedSeal ps -> do
>> +    when (not $ nullRL r || unsealFlipped nullRL (headRL r)) $
>> +     putDocLnWith simplePrinters $ changelog opts' NilRL $
>> +                  -- This unsafeCoerceP seems to be neccessary due to
>> +                  -- inventory splits at tags.  Without the context
>> +                  -- would be too long (and redundant) for practical
>> +                  -- usage.
>> +                  get_changes_info opts' [] (unsafeCoerceP ps)
>> +  where opts' = if HumanReadable `elem` opts || XMLOutput `elem` opts
>> +                then opts
>> +                else MachineReadable : opts
>> +        mF :: (FORALL(y) a C(y x) -> b C(y x)) -> FlippedSeal a C(x)
>> -> FlippedSeal b C(x)
>> +        mF f (FlippedSeal a) = flipSeal $ f a
>>
>> The problem here is that we don't give a PatchSet to get_changes_info,
>> we give it a patch sequence (specifically the starting context is not
>> ()).  It seems as though we are trying to treat all the patches since
>> the tag as being all the patches in the repository, as if starting
>> from ().  The easiest fix here is to get rid of what I interpreted as
>> an optimization and just pass the result of slightly_optimize_patchset
>> directly to get_changes_info.  That seemed like a pretty big change
>> for me to make without your feedback.
>
> I haven't found where the code you're quoting is located.  Maybe I
> haven't got all your changes?

Hmm....if you applied the whole patch bundle it should have been in
there.  I generated this email by looking at the submitted patch
bundle and a repository where the patches were applied.  Strange.

We feed an incomplete PatchSet named ps to get_common_and_uncommon
inside of get_changes_info.  Changes.lhs seems like it's going to be
problematic for the type witness change over.

On the one hand, changes provides read-only access to the repository
so it seems less critical.  On the other hand, it seems very bad from
a usability stand point to have changes potentially harbor bugs, which
the type witnesses could spot, and thus give users erroneous output.

Maybe we need to do some research into why only a partial patchset is
used here.  If there is some performance problem it was addressing, it
would be nice to know what the test case is so I can see concretely
how big the change is.  This is based on the observation that the
simplest fix is to use the whole patchset instead of an incomplete
one.

Other solutions might include making something like
get_common_and_uncommon that does not assume it is given a complete
patchset.

>> Next we have to Add.lhs, in particular addp is problematic, but I
>> think I may have that one figured out finally.
>>
>> After that is MarkConflicts.lhs:
>>      -- These unsafeCoercePs are safe because above, if there were
>>      -- unrecorded changes we reverted them by applying their inverse
>>      -- to the working copy.  Thus, r = u.
>>      do add_to_pending repository (unsafeCoerceP res :: FL Prim C(u x))
>>         applyToWorking repository opts (unsafeCoerceP res :: FL Prim
>> C(u x)) `catch` \e ->
>>             bug ("Problem marking conflicts in mark-conflicts!" ++ show e)
>>
>> Again we have an issue with which state is tracked in RIO.  Just a few
>> lines up from that we check that pending is NilFL and if it's not
>> apply the inverse pending, but doing that in RIO won't help.  What
>> should we do about that other than unsafeCoerceP or extending RIO?
>
> I would break this into two stages, with two calls to withRepoLock.
> In the first one, you revert all unrecorded changes, and then in the
> second one you can simply verify that unrecorded and recorded are
> identical.  It's a bit hokey, but it's safe and relatively simple.

Okay, so the final version will have some sort of:
  do unrec <- get_unrecorded_unsorted
       case sloppyIdentity unrec of
        IsEq -> do add_to_pending ....
        _ -> bug "there are still unrecorded changes"

I think I understand how to fix this one up.  Thanks.

> It might be a good idea to introduce a second monad for functions that
> modify the unrecorded state.  I'd rather keep this distinct from RIO,
> because it's not generally safe to intermingle changes to the working
> directory with changes to the recorded/tentative state.

Interesting point about intermingling changes.  I guess the 'cute'
name for such a monad would be UIO, for Unrecorded Repository IO.  I
think URIO is too long.  If we only need it for this one spot it seems
sort of heavy handed if the above trick suffices.

> As usual, you change the return type to be sealed:
>
> mapMFL :: Monad m => (FORALL(u v) a -> m (Maybe (Sealed (b C(u)))))
>       -> [a] -> m (Sealed (FL b C(x)))
> mapMFL _ [] = return $ seal NilFL
> mapMFL f (a:as) = do b <- f a
>                     case b of
>                       Nothing -> mapMFL f as
>                       Just (Sealed b) -> do Sealed bs <- mapMFL f as
>                                             return $ seal (b:>:bs)

This used to trip me up more, but I think I rather clearly know now
why this trick works.  The call site fully determines how u and x will
unify so "return $ seal (b:>:bs)", seems more obvious to me than it
used to.  This could be why I overlooked it before.  Plus, I had the
impression from our other refactorings that 'unsafeCoerceP NilFL' was
something we just did.  I know it came up a bit initially in
Depend.lhs.

>> The next one I see is in Optimize.lhs:
>> choose_order :: forall p C(z). RepoPatch p => PatchSet p C(z) -> PatchSet p 
>> C(z)
>> choose_order ps = case last_tag of
>>  Sealed NilRL -> ps
>>  Sealed (lt:<:_) ->
>>      co lt $ mapSeal slightly_optimize_patchset $ get_patches_in_tag
>> (info lt) ps
>>  where last_tag = dropWhileRL (not . is_tag . info) (concatRL ps)
>>       dropWhileRL :: (FORALL(x y) a C(x y) -> Bool) -> RL a C(r v)
>>                   -> Sealed (RL a C(r))
>>       dropWhileRL _ NilRL = seal NilRL
>>       dropWhileRL p xs@(x:<:xs')
>>                   | p x       = dropWhileRL p xs'
>>                   | otherwise = seal xs
>>       co :: (PatchInfoAnd p) C(a b) -> SealedPatchSet p -> PatchSet p C(z)
>>       co lt' (Sealed ((t:<:NilRL):<:pps)) =
>>            case get_patches_beyond_tag lt' ps of
>>             -- This unsafeCoerceP is safe because t = lt'.  We would use
>>             -- (=\/=) to prove it, except that get_patches_in_tag must first
>>             -- be fixed to return a meaniful context.
>>             p :<: NilRL -> (p+<+((unsafeCoerceP t):<:NilRL)) :<: pps
>>             _ -> impossible
>>       co _ _ = impossible
>
> Part of the problem may relate get_patches_beyond_tag has a wrong
> type.  Its type (and in future, when introducing new functions, please
> try to give them new names) is only right if you've already called it
> once to find out the type of the tag in its minimal context.

I wasn't trying to introduce a new function and that is why I stole
the name.  It wasn't until quite a bit down the road that I realized
I'd still need the old type signature is several places.  I really did
mean for it to replace the old type signature.  In the final version
of these changes I can give this more typeful one a different name.
Actually, just adding get_patches_around_tag and phasing out the old
ones seems best.  You don't think doing so will introduce unwanted
inefficiencies do you?

I'm glad you've had a chance to give me feedback on it now though.  I
didn't  catch on to this minimal context bit from reading the code.
My next iteration of patches will have some haddocks about it as it
seems highly relevant and non-obvious to at least some people
(myself).

I'm confident there are still issues here, but you'll need new patches
from me to discuss them I reckon.

>> Actually, I think that middle element needs to have the same context
>> as the PatchInfo, so:
>> get_patches_around_tag :: RepoPatch p => PatchInfoAnd p C(x y) ->
>> PatchSet p C(z) -> (RL (RL (PatchInfoAnd p)) C(() x),  (PatchInfoAnd p) C(x 
>> y),  RL (RL (PatchInfoAnd p)) C(y z))
>
> No, this would be wrong, for the same reason that
> get_patches_beyond_tag is wrong, which is that there's no guarantee
> that the tag input is in its canonical order (i.e. that there aren't
> any patches preceding it that are not in the tag).

Ah, hmm.  I didn't catch on to that from reading the code.  My
proposed solution of get_patches_around_tag may not be sufficient now
as I seem to recall the reason I changed the type signature was to get
the return value to have a specific context.  Either way, it seems
that adding a correct get_patches_around_tag would still get me closer
to the right solution.  So that is still something I should do it
seems.

>> How is the unsafeCoerceP in Put.lhs avoidable?  patchset == patchset2,
>> but due to the unsealing their ending contexts are considered
>> different by the type system.  The only way to fix it is by updating
>> this function:
>> get_one_patchset :: RepoPatch p => Repository p C(r u t) ->
>> [DarcsFlag] -> IO (SealedPatchSet p)
>> get_one_patchset repository fs =
>>    case nonrange_matcher fs of
>>        Just m -> do ps <- read_repo repository
>>                     if nonrange_matcher_is_tag fs
>>                        then return $ get_matching_tag m ps
>>                        else return $ match_a_patchset m ps
>>        Nothing -> (seal . scan_context) `liftM` mmapFilePS
>> (toFilePath $ context_f fs)
>>    where context_f [] = bug "Couldn't match_nonrange_patchset"
>>          context_f (Context f:_) = f
>>          context_f (_:xs) = context_f xs
>>
>> That function seems to call several Sealed returning functions.  What
>> is the solution here?  Obviously the unsafeCoerceP is currently quite
>> safe and is the easiest solution.
>
> This one is tricky.  The canonical answer (which is wrong in this
> case) would be to avoid computing the patchset twice.  The type
> witnesses tend to fail when we do redundant work, which is often a
> good thing, since it means that the type system helps us to catch
> inefficiencies.

I only know of uniqueness typing very vaguely, but is that what we're
recreating here?  I should probably find a paper or two and read them
for the RIO section I'm writing...

> In this case, the inefficiency is very intentional, however, in order
> to avoid a memory leak.  So the simple solution of eliminating
> patchset2 and replacing it with patchset, which would satisfy the type
> checker and save on disk IO is a very bad idea.

Is this related to the lazy scan_bundle stuff or did you find a
different memory issue here?  I don't (yet) spot these memory leaks,
so if you have some time to comment on this one I would appreciate it
as it would be very educational.

> I think the best solution may be one that is designed for precisely
> this problem, something like:
[snipped examples]
> etc.  But I'm just brainstorming at the moment.
>
> This is a tricky problem, or at least it seems tricky to find an
> elegant and general solution.

If I get to this point you'd be okay with the less general
moderatelySafePerformIOtwice/moderatelySafePerformRIOtwice solutions?

I'm hesitant to go for the generality of the other solutions on the
basis that we're not likely to need it.

>> I was able to remove the unsafeCoerceP from Remove.lhs.
>
> I don't see this unsafeCoerceP...

Again, strange, but since I was able to remove it it shouldn't need
any comments unless my way of removing it was weird, which would come
up naturally when the patches for Remove get submitted.

Thanks!
Jason
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] darcs patch: Begin using RIO (and 39 more)

Reply via email to