[darcs-users] darcs and document object models

Stephen J. Turnbull Sat, 29 May 2021 19:49:33 -0700

Harald Geyer writes:

 > Has anybody tried to get the patch theory work with xml files in a
 > way, that uses DOM semantics. How difficult would it be to
 > implement this?


I thought briefly about this, but this was a couple of years before
Camp, when Darcs was unacceptably slow and too clumsy for my main
application (splitting megapatches by a recalcitrant genius into
reviewable coherent changesets).  I ended up just using Emacs's
diff-mode and git, and never really used Darcs after that, though I
remain interested (mostly lurking) in patch theory.

In some sense we would like a patch theory that respects a semantics
functor (don't ask me to make that precise ;-).  In your case, the
DOM.  But all discussions of patch theory eventually end up
implemented with string editing as the semantics functor.

I ended up with the conclusion that git's model of changesets would be
a better foundation in many cases than Darcs's model of patches (but I
didn't understand it well, while I found git's "leave it to the user
on conflict" approach sufficient for my applications).  The point is
that a git commit is actually a tree of collections (representing
directories in the file system) ending in blobs at the leaves.  Well,
there's no reason why the collections have to treat *files* as blobs.
A file can be treated as a collection of extents of text (and these
extents could be recursively nested), and XML of course is
fundamentally tree-structured in exactly this way "all the way down".

You might ask, well, wouldn't that just be a different type of patch
in Darcs?  And you'd probably be right.

 > Im mostly interested in the case, where the document objects have
 > some kind of (globally) unique ID, which allows us to track how
 > objects are added, removed, changed of moved.

But you could also make them "content-addressable" as in git by naming
them with hashes of the text.  It might be a good idea to always
compute the hash, and also include the id attribute on the element if
available in the VCS metadata for the element.

I guess the way to implement "DOM patches" for your purpose would be
to repurpose the hunk patch machinery to work in terms of elements
instead of lines.  That shouldn't be too hard, except that you'd
probably want it to be moderately smart about choosing the elements
that constitute the hunks.  For example, consider a "flat" HTML file
whose body is a sequence of P elements, and you combine two by
removing a </P><P> pair.  Then the smallest single element that
contains the whole change is the BODY element.  Obviously you don't
want the darcs diff to be -OLD_BODY +NEW_BODY.  This might not be all
that difficult, though.  For things like HTML files that have large
amounts of text in elements, you might also want line-oriented hunks
(or perhaps character-oriented hunks) that are constrained to be
contained in an element.

 > This would be a huge step towards having things like CAD models under
 > meaningful version control.

I don't think it's that big a step technically, as you see.  But it
would be quite a bit of work, and I wonder if this alone would seem so
great in practice.  We have a lot of UI infrastructure (various kinds
of diff, etc) for text-oriented version control.  I know we have some
UI for the DOM (eg, the DOM browsers that some web browsers provide),
but I don't know how easy that will be to hook into the version
control structures.  OTOH, it might only take a few lines of glue code
to an existing DOM browser and then it would all "just work". :-)

Steve
_______________________________________________
darcs-users mailing list
darcs-users@osuosl.org
https://lists.osuosl.org/mailman/listinfo/darcs-users

[darcs-users] darcs and document object models

Reply via email to