[darcs-users] darcs-hs/hashed-storage review

Ganesh Sittampalam Tue, 04 Aug 2009 14:38:23 -0700

Hi,

Here are my initial thoughts about the hashed storage work. I'm stillreading through it and I anticipate having more to say as I spend moretime on it. I'm looking at both the hashed-storage package and the changesto darcs to make use of it.

I'll start by summarising what it's about, both as an introduction forothers and so any misconceptions or omissions can be corrected.

I haven't yet had time to read through all the past mailing list trafficon the subject of hashed-storage etc, so please point me to any relevantposts if it seems appropriate. I will try to read through them asap.

The basic point of this work is to optimise and clean up the way thatdarcs interacts with the filesystem. The core of this is the new 'Tree'type, which replaces the old 'Slurpy' type. It differs most importantly byusing explicitly embedded IO actions for each file or subdirectory, asopposed to the lazy IO of Slurpy.

The work is split into a new 'hashed-storage' package and some changes todarcs to make use of this package. In general the changes to darcs arerelatively small and are often simplifications.

As its name suggests, hashed-storage takes over the work of dealingwith hash-structured storage as used by the darcs 1.5/2 formats. It alsointroduces the index, which is a cache of the hashes (and related info)of an entire tree, stored in a single file. This allows for much fasteroperations in many cases. hashed-storage also deals with "plain" treessuch as the working copy and old-style pristine.

There is also a 'TreeIO' monad, intended to provide an abstraction forworking with 'Tree's and also some optimisations such as buffering upwrites until a size threshold is reached.


And now my initial comments:

Overall I think something like this is clearly the right way to go andgenerally I think it looks nice, with easy to read code. Although Ihaven't been through the changes to darcs in great detail yet theygenerally look like providing significant improvements in simplicity.

My most important question is about the dividing line betweenhashed-storage and darcs-hs. The biggest weakness of hashed-storage as aseparate package is that it has functions that are specific to Darcsrepositories. To my mind, it either needs to abandon this knowledge,perhaps by abstracting it over a typeclass that is meaningful in itself,or some or all of hashed-storage needs to move into the darcs tree proper.Otherwise, changes to some parts of darcs will continue to requirelockstep changes to hashed-storage - which is ok when Petr is developingthe two together intensively on his own, but not really when other peopleare involved or over a longer timeframe. One option would be to move itall into the darcs tree but distribute it as a separate cabal package witha more darcs-specific name.


Onto more specific comments, which are mainly in note form:

- The naming of Darcs.Gorsvet is obscure and a barrier to futureunderstanding of the code

- In Darcs.Gorsvet, is there any reason not to use bracket inmInCurrentDirectory (as a comment suggests)?

- The floatPath function from Storage.Hashed is documented as being unsafeand then used all over Darcs.Gorsvet. Needs an explanation of what'sunsafe and why it's ok to use it in Darcs.Gorsvet.

- In various places (e.g. deleting on windows, corrupt pending etc)a file gets renamed to a fixed different name, which could causeproblems if it already exists. There should be some standard scheme toavoid collisions.

- We need to sort out haskell_policy, the ratification of readFile allover the place is just silly. Replacing it with hlint as has beensuggested sounds like a very good idea.

- I think treeDiff is actually relatively unlikely to be wrong because itdelegate s actual line by line diffing to Patch.Prim which hasn't changed.Storage.Hashed.Diff exists but seems unused, what's the status andplans there?

- unrecordedChanges - Why do IgnoreTimes and LookForAdds lead to ignoringthe index?

- Presumably biggest risk is random bugs with invalid indexes. Anexplanation of when the index can become invalid would be helpful (theremay be one in the code that I haven't spotted yet).


- Storage.Hashed.Index: code is full of magic numbers

- Petr and I have already discussed overloading Tree and TreeIOover the container (IO). This would make writing test rigseasier, and mean that code that processes Tree could be polymorphicwhere appropriate. Overloaded code would typically constrain the containerto be in Monad or perhaps Applicative.

- Win32 removeFile workaround: not used consistently, should be hiddenbehind some standard abstraction?


- I don't really understand what virtualTreeIO does and doesn't do.

- I think TreeIO should be abstract, i.e. a newtype, unless there is somevalue in exposing the fact that it is a StateT.


- hashedTreeIO never deletes things, so what removes dead stuff?

- why don't the two cases of hashedTreeIO.updateFile use the same codepath?

- I'm very dubious about semantics of finish - why doesn't expandPath callit? Presumably the documented usage semantics of readIndex have somethingto do with this, but it feels fragile, and I only sort of understand thewhole dirs_changed/finish stuff in readIndex'


That's all for now...

Cheers,

Ganesh
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

[darcs-users] darcs-hs/hashed-storage review

Reply via email to