On 4/14/2010 22:49, Isaac Dupree wrote:
On 04/14/10 20:18, Max Battcher wrote:
All of which goes to show that Trac+darcs still isn't well optimized for
caching darcs queries or dealing gracefully with with long running
command invocations... I still say the Trac reliance on CVS/SVN-style
revision numbers means that Trac is absolutely not well-adapted for
serving darcs repositories. It may be "revision 1782" to Trac, but 'show
contents --match "hash 2008..."' is "commute this file to how it would
appear if only the patches preceding or equal to this one with a
timestamp from two years ago were applied" to darcs. (Which ends up
being quite possibly not a "real" historic version at all,

Well, suppose you have a public darcs repository for a project. (Such as
GHC HEAD.) If you look at the history of the real world (as opposed to
darcs' conception of history), this repo contained a series of states
over time. What infrastructure would we need, to be able to look at this
series usefully/efficiently years later? (I am reckoning that this
concept of history is useful enough that it's worth creating whether or
not darcs itself can support it. Does anyone agree/disagree?)

I've put an odd amount of thought into that over the years, and I've also wondered how important it might be in reality... Different developers will probably disagree on which bits are important, and I think some of those philosophical differences are precisely the same reasons why git and darcs (for instance) can co-exist because developers may continue to prefer one approach over the other...

First of all, darcs does have one concept of real world history that already is critical in many areas to darcs performance: the TAG. If there is an important point in a repository's history, it should be named and tagged. I can see a distinct case for specifically making sure that any/all operations --to-tag/--from-tag are as performant as possible. I could also see a case for some sort of (possibly opt-in) auto-caching system for tag states (pristines).

Beyond that, darcs itself doesn't have any knowledge of "real world history"... It doesn't track which patch was pulled/pushed in, only when the patch was originally written (according to the clock of the system on which the patch was written). This makes sense to darcs due to the "fluidity" of patch movement (thanks to cherry-picking) and potential complexity. (Should darcs try to record the integration history of a patch across every branch/repository that patch has ever seen/will ever see? How do you merge "conflicting" integration histories? How controls it? How do you keep it secure?) Darcs admittedly takes the easiest possible approach, which is: don't worry about it.

Is that the correct approach? Maybe. Assuming valid timestamps all around and adequate tagging darcs' commutation-based conception of history is a close enough approximation to real history to help a patient human find what they are looking for. (Certainly not a close enough solution to make "every version" available via direct HTTP GET requests to darcs commands, but on the order of a file system search for a human performing a query, for example.)

Assuming that you do critically need/want more historic version information cached/saved... Here's something of the possibility spectrum:

* The "pig-in-a-blanket" repository: store a darcs repository inside a git (or svn, or whatever) repository. It sounds silly, but its not all that different than using some of the "patch queue" tools that git/hg/svn users already use... you're just using darcs as a more powerful patch queue and git (or whatever) as the fastest, dumb "store the state of lots of files at each moment in time that I designate" file store that you can find and trust. (Slightly less crazy variations might be to take use directly of a distributed block store like S3, HDFS, or even a document database...)

* Context-generating pre/posthook: before/after history manipulating commands (apply, pull, record, amend-record, ...) something like:

  darcs changes --context > archive.`date %s`.context

That's the basics you would need to keep track of actual, real-world historical states. Although, you'd probably want to compress the context files together for more long term storage, or find some more capable storage engine. From the generated context files you should be able to recreate all of the actual historical states. (Unfortunately it may not be as performant or capable as it should be, because context files need a bit of love...)

* In-repo branching: There's a long thread on the subject, but the basics are that the hashed-storage backend could easily store more than one inventory/pristine state in the same repository. Theoretically you could build a third-party tool to handle multiple "root pointers" and then "hold onto" root pointers for historic versions so that those files don't get garbage collected. (This is sort of an inversion of the "pig-in-the-blanket" idea: use darcs' own current data storage backend (hashed-storage), but encourage it/tune it to store more than darcs alone does.)

* Propose a useful interaction pattern for darcs optionally to track such things itself and help it get implemented. Certainly, the toughest path, but it may be possible for someone to come up with a good plan of attack that darcs could implement directly.


That's how you might go about doing it... I personally don't see a need for it. I think there are more interesting tools that solve similar problems that could be improved first: better/stronger interactive file annotation/blame tools; better/stronger darcs trackdown; tools that maybe we don't even have names for today. I think it does come down to a matter of different lifestyles for different DVCS tools: darcs' "bag" of patches != git's DAG of file states.

In most of my development workflow, when I care about historical states, I care about 1) tag file states, and 2) individual patch deltas... historical integration states in between the two are much less common for me to seek out. Both (1) and (2) are easy enough to get from darcs... But that's just my approach and I appreciate that other developers will disagree on this.

Hopefully some of the above is useful,

--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to