[darcs-users] Historical Versions (was: GSoC: network optimisation vs cache vs library?)

Max Battcher Wed, 14 Apr 2010 22:33:59 -0700

On 4/14/2010 22:49, Isaac Dupree wrote:

On 04/14/10 20:18, Max Battcher wrote:

All of which goes to show that Trac+darcs still isn't well optimized for
caching darcs queries or dealing gracefully with with long running
command invocations... I still say the Trac reliance on CVS/SVN-style
revision numbers means that Trac is absolutely not well-adapted for
serving darcs repositories. It may be "revision 1782" to Trac, but 'show
contents --match "hash 2008..."' is "commute this file to how it would
appear if only the patches preceding or equal to this one with a
timestamp from two years ago were applied" to darcs. (Which ends up
being quite possibly not a "real" historic version at all,


Well, suppose you have a public darcs repository for a project. (Such as
GHC HEAD.) If you look at the history of the real world (as opposed to
darcs' conception of history), this repo contained a series of states
over time. What infrastructure would we need, to be able to look at this
series usefully/efficiently years later? (I am reckoning that this
concept of history is useful enough that it's worth creating whether or
not darcs itself can support it. Does anyone agree/disagree?)

I've put an odd amount of thought into that over the years, and I'vealso wondered how important it might be in reality... Differentdevelopers will probably disagree on which bits are important, and Ithink some of those philosophical differences are precisely the samereasons why git and darcs (for instance) can co-exist because developersmay continue to prefer one approach over the other...

First of all, darcs does have one concept of real world history thatalready is critical in many areas to darcs performance: the TAG. Ifthere is an important point in a repository's history, it should benamed and tagged. I can see a distinct case for specifically making surethat any/all operations --to-tag/--from-tag are as performant aspossible. I could also see a case for some sort of (possibly opt-in)auto-caching system for tag states (pristines).

Beyond that, darcs itself doesn't have any knowledge of "real worldhistory"... It doesn't track which patch was pulled/pushed in, only whenthe patch was originally written (according to the clock of the systemon which the patch was written). This makes sense to darcs due to the"fluidity" of patch movement (thanks to cherry-picking) and potentialcomplexity. (Should darcs try to record the integration history of apatch across every branch/repository that patch has ever seen/will eversee? How do you merge "conflicting" integration histories? How controlsit? How do you keep it secure?) Darcs admittedly takes the easiestpossible approach, which is: don't worry about it.

Is that the correct approach? Maybe. Assuming valid timestamps allaround and adequate tagging darcs' commutation-based conception ofhistory is a close enough approximation to real history to help apatient human find what they are looking for. (Certainly not a closeenough solution to make "every version" available via direct HTTP GETrequests to darcs commands, but on the order of a file system search fora human performing a query, for example.)

Assuming that you do critically need/want more historic versioninformation cached/saved... Here's something of the possibility spectrum:

* The "pig-in-a-blanket" repository: store a darcs repository inside agit (or svn, or whatever) repository. It sounds silly, but its not allthat different than using some of the "patch queue" tools thatgit/hg/svn users already use... you're just using darcs as a morepowerful patch queue and git (or whatever) as the fastest, dumb "storethe state of lots of files at each moment in time that I designate" filestore that you can find and trust. (Slightly less crazy variations mightbe to take use directly of a distributed block store like S3, HDFS, oreven a document database...)

* Context-generating pre/posthook: before/after history manipulatingcommands (apply, pull, record, amend-record, ...) something like:


  darcs changes --context > archive.`date %s`.context

That's the basics you would need to keep track of actual, real-worldhistorical states. Although, you'd probably want to compress the contextfiles together for more long term storage, or find some more capablestorage engine. From the generated context files you should be able torecreate all of the actual historical states. (Unfortunately it may notbe as performant or capable as it should be, because context files needa bit of love...)

* In-repo branching: There's a long thread on the subject, but thebasics are that the hashed-storage backend could easily store more thanone inventory/pristine state in the same repository. Theoretically youcould build a third-party tool to handle multiple "root pointers" andthen "hold onto" root pointers for historic versions so that those filesdon't get garbage collected. (This is sort of an inversion of the"pig-in-the-blanket" idea: use darcs' own current data storage backend(hashed-storage), but encourage it/tune it to store more than darcsalone does.)

* Propose a useful interaction pattern for darcs optionally to tracksuch things itself and help it get implemented. Certainly, the toughestpath, but it may be possible for someone to come up with a good plan ofattack that darcs could implement directly.

That's how you might go about doing it... I personally don't see a needfor it. I think there are more interesting tools that solve similarproblems that could be improved first: better/stronger interactive fileannotation/blame tools; better/stronger darcs trackdown; tools thatmaybe we don't even have names for today. I think it does come down to amatter of different lifestyles for different DVCS tools: darcs' "bag" ofpatches != git's DAG of file states.

In most of my development workflow, when I care about historical states,I care about 1) tag file states, and 2) individual patch deltas...historical integration states in between the two are much less commonfor me to seek out. Both (1) and (2) are easy enough to get fromdarcs... But that's just my approach and I appreciate that otherdevelopers will disagree on this.


Hopefully some of the above is useful,

--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

[darcs-users] Historical Versions (was: GSoC: network optimisation vs cache vs library?)

Reply via email to