While thinking about (and working a bit on) the "single file pristine cache" on DB, I thought about an alternative solution with many pros and one cons.
The idea is to store the "filesystem" (hierarchical informations + metadata like name, permissions and so on) on a database file, but continue to store file data on the filesystem. The file structure would be "flat", not the original hierarchical names, and file names would be substituted by numerical ids. To avoid having tens of thousands of files on the same directory, a simple hashing algorithm on filenames (similar to the Squid proxy one) could be used. As an example, if we have this source filesystem: /A /B /D1/A /D1/C /D2/D It will become on filesystem (without the directory structure for hashing): /1.dat /2.dat /4.dat /5.dat /7.dat And on database something like (very simple, without attributes for this example): ID NAME TYPE PARENT 0 / D NULL 1 A F 0 2 B F 0 3 D1 D 0 4 A F 3 5 C F 3 6 D2 D 0 7 D F 6 This is just an example, the actual implementation could differ. The Pros are: 1) Easier to implement (the hierarchical db structure is easily Slurpable). 2) No performance issues. 3) We can continue to use mmap on files. 4) We are not in a single-file pristine case, but we no longer have issues with "spurious" files on the pristine directory. 5) Filenames are made up, so no find / sed issues. 6) We can always be case sensitive on pristine. 7) We don't need optimizing the db as often as we would do using blobs (they causes "space bubbles" on db files during deletion and modify). 8) File size limits are the same as the host filesystem The Cons is: 1) We no longer have transactional access to the data part of the pristine, only on the metadata part. What do you think about it? Salvatore _______________________________________________ darcs-devel mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-devel
