2010/4/14 Philipp Marek <philipp.ma...@emerion.com>: > Hello Jan! > > On Dienstag, 13. April 2010, Jan Horák wrote: >> Dne 12.4.2010 8:02, Philipp Marek napsal(a): >> > Sorry for the delay; but reading the thread "Severe performance issues >> > with large directories" I just remembered that the backend has a little >> > bit of a problem with big directories - storage overhead. >> > >> > Do you see any way to split directories into a series of blocks (like >> > files are done), and when changing only a few of the files using pointers >> > to the unmodified blocks of the old directory? >> > >> > I don't propose a real delta design - that was too slow, IIRC. >> > Just re-use of directory blocks; that shouldn't bring any performance >> > issues. >> > >> > >> > Is there some way to do that? Perhaps multiple "." entries in a >> > directory, which just point to other parts? >> I'm not sure if we think the same issue, but I was thinking about a kind >> of hash table. Using >> a sophisticated table size it could bring good results, supposedly. > Sorry, I didn't make myself clear. > > I didn't find the issue I'm talking about in the issue tracker; but the > problem is that the backends (FSFS, BDB) don't store directories deltified > (for performance reasons), and so modifying an entry in or below a big > directory has to re-write the whole directory - and that means several > megabytes, for big directories. > > > So I'd suggest to change the directory storage. > * Either use a new table, with fields like parent, name (or path), > valid-from-revision, valid-before-revision or something like that; > then changing an entry means only updating valid-before of the > old record, and inserting a new one. > * Or, if you want to store directories in the same way as file data (like > now in FSFS and BDB), I'd suggest to limit such blocks of directory data > to a few KB, but to define an indirect-block that tells which blocks are > used. > A new entry could then reference all the unchanged blocks of the older > revision.
Or (3): go ahead and store megabytes for each directory, just like the other backends. And leave the solution of this problem to a future iteration of the SQL-based backend. Really... optimizing before you even get started is not advisable. Get something done. THEN examine and iterate. There could be numerous other problems inherent in a SQL backend that would obviate any such "solution" proposed today. Also, the "SQL backend" concept has been started several times before, and abandoned. I don't want to see it get abandoned AGAIN because the initial "solutions" make it overly complicated before it can even begin. Cheers, -g