Hello Jan! On Dienstag, 13. April 2010, Jan Horák wrote: > Dne 12.4.2010 8:02, Philipp Marek napsal(a): > > Sorry for the delay; but reading the thread "Severe performance issues > > with large directories" I just remembered that the backend has a little > > bit of a problem with big directories - storage overhead. > > > > Do you see any way to split directories into a series of blocks (like > > files are done), and when changing only a few of the files using pointers > > to the unmodified blocks of the old directory? > > > > I don't propose a real delta design - that was too slow, IIRC. > > Just re-use of directory blocks; that shouldn't bring any performance > > issues. > > > > > > Is there some way to do that? Perhaps multiple "." entries in a > > directory, which just point to other parts? > I'm not sure if we think the same issue, but I was thinking about a kind > of hash table. Using > a sophisticated table size it could bring good results, supposedly. Sorry, I didn't make myself clear.
I didn't find the issue I'm talking about in the issue tracker; but the problem is that the backends (FSFS, BDB) don't store directories deltified (for performance reasons), and so modifying an entry in or below a big directory has to re-write the whole directory - and that means several megabytes, for big directories. So I'd suggest to change the directory storage. * Either use a new table, with fields like parent, name (or path), valid-from-revision, valid-before-revision or something like that; then changing an entry means only updating valid-before of the old record, and inserting a new one. * Or, if you want to store directories in the same way as file data (like now in FSFS and BDB), I'd suggest to limit such blocks of directory data to a few KB, but to define an indirect-block that tells which blocks are used. A new entry could then reference all the unchanged blocks of the older revision. I hope that this explains it a bit better. Regards, Phil