Stephen Leake <[EMAIL PROTECTED]> writes:

> As an example, let's consider dvc-status. Currently, it looks like this:
>
> [...]

Good example. I think we've mentionned that already, and we agree that
the code can be factored better.

> So, taking things to the extreme, we _could_ refactor this:

That's one step too far IMO: this doesn't allow a back-end to override
status. I agree that providing a default implementation in DVC itself
is good, but we should still allow more back-end specific stuff, in
case.

> Of course, we also need to consider what the other back-ends do. tla
> handles nested trees; I'll just ignore that for now (we've agreed DVC
> doesn't handle nested trees).

No, you misunderstood me.

Having a tree whose root is below the root of another is good. tla has
a few features to deal with that, git has submodule support, hg has
the forest extension, bzr has the beginning of a by-reference nested
tree support too (IIRC, it's implemented, but the commands are hidden
because the bzr developpers are not satifsied with the UI yet. It
might have changed since last time I checked). Nested trees are _the_
way to deal with very large trees (like KDE or the FreeBSD ports) with
any modern VCS.

BUT, the way you want to use nested trees is bad, most VCS won't even
support it, and I've never seen any decent VCS user recommand it.

The difference is that you propose to have some files versionned both
in the nested tree and in the containing tree. As opposed to that, any
sane VCS will consider the nested tree as a black box, and never look
at its files.

VCS with good nested trees support will record the state of the nested
tree when commiting in the containing tree. Take the example of git:
anything in git is an object, identified by the sha1 sum of its
content. Up to recently, a tree in git could contain files and
symlinks. Nested trees support was added by adding another type of
objects: subtrees. Subtrees are identified by the revision identifier
of this object (which is also a sha1 sum).

So, if you have

project/
 +- .git/
 +- file1
 +- subtree1/
     +- .git/
     +- file2
 +- subtree2/
     +- .git/
     +- file3

You can, for example, make changes to file{1,2,3}, and record the
changes in an atomic way (typical example: subtree1/ is a library, you
change something in its API, and use this change in file1. You want to
record both changes atomically, to be able to come back later either
to the old API and the old use, or the new version of both, but not a
mix). So, you'd commit in subtree1, commit it subtree2, and when you
run "status" in project/, it will notice the changes. Then, you commit
a new version of project/, which points to the new commits of subtree1
and subtree2.

If you ever commit file2 as part of project/, you can be sure to run
into troubles whenever you try to merge, or just walk through history.
Unless you have very strict discipline, you'll commit different
versions to subtree1 and to project/. So, file2 should be versionned
as part of subtree1, but not as part of project/.


Another very simple usage of nested trees is when you have somehow
unrelated stuffs in realted directories. For example, I have dvc as a
subtree of ~/emacs-lisp/. I won't put my personnal emacs-lisp/ stuff
in DVC, and since DVC is versionned, I don't need to re-version it in
emacs-lisp.


And again, this is not just my personal opinion. That's what every VCV
I've looked at do.

> In practice, it's probably more complicated than that; we may not be
> able to push the dvc-run-dvc-async macro into the front-end. But the
> idea is to push as _much_ common code as possible into the front-end.
>
> dvc-ignore-file-extensions, dvc-ignore-file-extensions-in-dir already
> do things in this style.

Yes, I like the way it's done.

It could be more generic by allowing direct back-end
re-implementation, but since I have no use-case where this would be
needed, it can/should remain as it is until we have a real use-case.

> Doing things this way accomplishes a couple of things:
>
> 1) Minimize maintenance complexity by reducing duplicate code among
>    back-ends. 
>    [...]
>
> 2) Maxmize commonality of behavior across back-ends; all back-ends use
>    the same mode and buffer names for the status buffer, errors are
>    handled in the same way, etc. The DVC UI is more standardized.
>
>    The importance of this goal is less widely accepted.

I do agree with both. The 2) just has its limits. Things should be
identical in all back-ends when there's no reason to make them
different.

On the other hand, I still think that DVC should try to have all
back-ends fit in the exact same UI. We already had that discussion
about git and the index, I'm quite satisfied with the way xgit deals
with that now. We have a little bit of back-end specific code, a
little bit back-end specific UI (one menu in the status buffer), but
we keep most of the code in DVC.

> One downside is that '<back-end>-dvc-status' no longer exists as a
> user-callable function. But with the recent introduction of
> dvc-back-end-wrappers, I don't see that as a significant downside.

Agreed.

> Another downside is that if we run into a new back-end that doesn't
> fit the latest factorization, we might have to change things around
> again. I think that's a risk worth taking; we have enough back-ends
> now to come up with a reasonable factorization.

Allowing a back-end to override dvc-status is actually quite easy. the
`dvc-function'/`dvc-apply' stuff already allow a default fallback
implementation.

So, all you have to do is to implement status in dvc-dvc-status, and
have dvc-status be a dispatching function (define-dvc-unified-command
can probably be used for that). You have ~10 lines of code and O(1)
overhead, for a good flexibility gain. That said, this change can be
done later, without touching the back-ends.


Side note: keep in mind that a big part of DVC comes from Xtla, which
we wrote before even thinking about supporting other back-ends. The
current shape of the code is not ideal regarding genericity, but it's
already a looooong way forward since DVC's day 1!

-- 
Matthieu

_______________________________________________
Dvc-dev mailing list
[email protected]
https://mail.gna.org/listinfo/dvc-dev

Reply via email to