While thinking about (and working a bit on) the "single file pristine
cache" on DB, I thought about an alternative solution with many pros
and one cons.

The idea is to store the "filesystem" (hierarchical informations +
metadata like name, permissions and so on) on a database file, but
continue to store file data on the filesystem.

The file structure would be "flat", not the original hierarchical
names, and file names would be substituted by numerical ids. To avoid
having tens of thousands of files on the same directory, a simple
hashing algorithm on filenames (similar to the Squid proxy one) could
be used.

As an example, if we have this source filesystem:

/A
/B
/D1/A
/D1/C
/D2/D

It will become on filesystem (without the directory structure for hashing):
/1.dat
/2.dat
/4.dat
/5.dat
/7.dat

And on database something like (very simple, without attributes for
this example):
ID NAME TYPE PARENT

0 / D NULL
1 A F 0
2 B F 0
3 D1 D 0
4 A F 3
5 C F 3
6 D2 D 0
7 D F 6

This is just an example, the actual implementation could differ.

The Pros are:
1) Easier to implement (the hierarchical db structure is easily Slurpable).
2) No performance issues.
3) We can continue to use mmap on files.
4) We are not in a single-file pristine case, but we no longer have
issues with "spurious" files on the pristine directory.
5) Filenames are made up, so no find / sed issues.
6) We can always be case sensitive on pristine.
7) We don't need optimizing the db as often as we would do using blobs
(they causes "space bubbles" on db files during deletion and modify).
8) File size limits are the same as the host filesystem

The Cons is:
1) We no longer have transactional access to the data part of the
pristine, only on the metadata part.

What do you think about it?

Salvatore
_______________________________________________
darcs-devel mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-devel

Reply via email to