On Thu, 2005-04-21 at 11:09 +0200, Denys Duchier wrote:
> Tomas Mraz <[EMAIL PROTECTED]> writes:
> > If we suppose the maximum number of stored blobs in the order of milions
> > probably the optimal indexing would be 1 level [0:2] indexing or 2
> > levels [0:1] [2:3]. However it would be necessary to do some
> > benchmarking first before setting this to stone.
> As I have suggested in a previous message, it is trivial to implement adaptive
> indexing: there is no need to hardwire a specific indexing scheme.
> I suspect that the optimal size of subkeys may well depend on the filesystem.
> My experiments seem to indicate that subkeys of length 2 achieve an excellent
> compromise between discriminatory power and disk footprint on ext2.
> Btw, if, as you indicate above, you do believe that a 1 level indexing should
> use [0:2], then it doesn't make much sense to me to also suggest that a 2
> indexing should use [0:1] as primary subkey :-)
Why do you think so? IMHO we should always target a similar number of
files/subdirectories in a directories of the blob archive. So If I
always suppose that the archive would contain at most 16 millions of
files then the possible indexing schemes are either 1 level with key
length 3 (each directory would contain ~4096 files) or 2 level with key
length 2 (each directory would contain ~256 files).
Which one is better could be of course filesystem and hardware
Of course it might be best to allow adaptive indexing but I think that
first some benchmarking should be made and it's possible that some fixed
scheme could be chosen as optimal.
Tomas Mraz <[EMAIL PROTECTED]>
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html