Hi,

before the questions about index behaviour get completely out of hand, here
comes the layout and usage documentation...

The _darcs/index is a binary file, that overlays a hashed tree over the working
copy. This means that every working file and directory has an entry in the
index, that contains its path and hash and validity data. The validity data is
a "last seen" timestamp plus the file size.

There are two entry types, a file entry and a directory entry. Both have a
common binary format (from Index.hs):

    data Item = Item { iPath :: BS.ByteString
                     , iName :: BS.ByteString
                     , iHash :: BS.ByteString
                     , iSize :: Ptr Int64
                     , iAux :: Ptr Int64 -- end-offset for dirs, mtime for files
                     } deriving Show

the actual on-disk layout can be seen from the peekItem implementation:

    peekItem :: ForeignPtr () -> Int -> Maybe Int -> IO Item
    peekItem fp off dirlen =
        withForeignPtr fp $ \p -> do
          nl' :: Int32 <- peekByteOff p off
          let nl = fromIntegral nl'
              path = fromForeignPtr (castForeignPtr fp) (off + 4) (nl - 1)
              hash = fromForeignPtr (castForeignPtr fp) (off + 4 + nl) 64
              name' = snd $ BS.splitAt (fromJust dirlen) path
              name = (BS.last name' == '/') ? (BS.init name', name')
          return $! Item { iName = isJust dirlen ? (name, undefined)
                         , iPath = path
                         , iHash = hash
                         , iSize = plusPtr p (off + 4 + nl + 64)
                         , iAux = plusPtr p (off + 4 + nl + 64 + 8)
                         }

The first word on the index "line" is the length of the file path (which is the
only variable-length part of the line). Then comes the path itself, then
fixed-length hash (sha256) of the file in question, then two words, one for
size and one "aux", which is used differently for directories and for files.

With directories, this aux holds the offset of the next sibling line in the
index, so we can efficiently skip reading the whole subtree starting at a given
directory (by just seeking aux bytes forward). The lines are pre-ordered with
respect to directory structure -- the directory comes first and after it come
all its items.

For files, this aux field has a copy of the timestamp of the corresponding
file, taken at the instant when the hash has been computed. This means that
when file size and timestamp of a file in working copy matches those in the
index, we assume that the hash stored in the index for given file is valid.

You may have noticed that we also keep hashes of directories. These are assumed
to be valid whenever the complete subtree has had valid timestamps. At any
point, as soon as a size or timestamp mismatch is found, the working file in
question is opened, its hash (and timestamp and size) is recomputed and updated
in-place in the index file (everything lives at a fixed offset and is fixed
size, so this isn't an issue). (This is also true of directories: when a file
in a directory changes hash, this triggers recomputation of all of its parent
directory hashes; moreover this is done efficiently -- each directory is
updated at most once during a run.)

Of course, the whole index structure is only "fixed" as long as the shape
(directory structure and filenames) of the working tree does not
change. Whenever we want to add or remove files, a new copy of the index is
created and written out and the old one is discarded (after dutifully copying
the relevant index "lines" out of the old version). This happens in darcs
whenever invalidateIndex had been called and the index is used.

Only those parts of the working tree that are tracked by darcs (ie. they have
pristine counterparts or there are pending adds for them) are indexed.

Yours,
   Petr.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to