On Feb 23, 2004, at 11:27, Bertrand Delacretaz wrote:
Le Lundi, 23 f�v 2004, � 17:20 Europe/Zurich, Scott Robert Ladd a �crit :
...When I've written cache system, I've always used the file system directly. The only catch is that some operating systems limit the number of files on disk or in a directory; a very active server could hit those limits using individual files....
And on certain filesystems, lookups from filenames to files can be slow if there are many files in the same directory.
That's why you do stuff like
/1 /1 /1 /2 /3 /2 /3 /4 /2 /3 /4 /5 /6
and so on, so instead of accessing a file from
134.txt
you go
1/3/4.txt
and that is log(n) FS operations with the number of files.
If we go for direct filesystem storage (which I think should be perfectly ok to implement a Store), I think we need a clever mapping scheme to avoid having more than N files in the same directory.
Not that clever: have, say, configurable n levels of folders, m folders per level, 100 files per folder: m^n*100 files. and 10000 files is pretty good considering that we should be caching only internal pipelines artifacts and not the results.
-- Stefano.
