Ganesh Sittampalam <[EMAIL PROTECTED]> writes: > It turns out that Benedikt Schmidt has some part-done work on this from a > year or two ago that he is now going to port to unstable and finish off, > so I'm going to defer to him.
I have made my repository with the current version of the code available at http://shim.haskellco.de/darcs-filecache/. For now it only supports creating the filecache from scratch and using the filecache to speed up 'darcs changes file'. For testing, e.g. in the darcs repo, you can create a filecache with 'darcs optimize --filecache'. Here are some measurements I just did in my version of the unstable repo. $ time ./darcs optimize --filecache real 0m37.514s user 0m11.154s sys 0m5.446s $ du -hs _darcs 43M _darcs/ $ du -hs _darcs/filecache 4.6M _darcs/filecache/ | | hot cache | cold cache | | darcs changes GNUmakefile | 3.717s | 12.113s | | darcs changes --filecache GNUmakefile | 0.842s | 3.568s | The current implementation of changes only filters the relevant patches from the complete set of patches in the repo, but the code still reads every patch that touches the file. Since the patchinfo is in the inventory and the information which patch touches which file, changes FILE should get by without reading any patches. The next step is querying the filecache for other commands like annotate and updating the filecache on record, pull, obliterate, and so on. In case someone is interested in the data- and on-disk structures: Optimize --filecache adds a filecache with a filecache-inventory file that contains the patchids (pinfo hash) of all patches in the inventory in the same order as the inventory. The mapping from file names to patches that modify these file is stored in the filemap directory. For every filename that has ever existed in the repo, there is a modification file named hash(relative_path_file) with an ordered list of modifications (newest last): * Touch patchid * Removed patchid * RenamedTo newfilehash index * RenamedFrom oldfilehash index Touch and Removed should be self explanatory, RenamedTo and RenameFrom are forward and backward pointers into other modification files when a file has been renamed. For example, to find all patches that touch the file FOO, we open _darcs/filecache/filemap/<hash(FOO)> and add all Touch entries starting from the end until we arrive at an entry of the other three types. If it is an Removed or RenamedTo entry, we stop since the file was created by the last Touch entry and the entries before the Remove or RenamedTo refers to an earlier file with the same name. If there is an RenamedFrom oldfilehash index entry, we knows that the file has been renamed from oldfile to FOO and we can read about the patches that touched oldfile in oldfilehash starting from index (reading backwards again). For keeping the filehash up-to-date we need the following operations: * append a modification entry to one of modification files * remove all entries that refer to a certain patch from the modification files Then we can handle arbitrary changes without recreating the filecache from scratch. After modifying the repo from oldstate =compref++oldpatches to newstate = compref++newpatches, we remove all patches in oldstate and append all entries for patches in newpatches. For removing the patches, it would be nice to also have a mapping from a patch to the set of files it modifies (or even an index into the file) so that scanning all modification files for the patch that has to be remove is not necessary. Benedikt _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
