On Mon, Jul 8, 2013 at 10:37 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Shawn Pearce <spea...@spearce.org> writes:
>> Has anyone studied the impact of converting the pack idx fanout table
>> from 256 entries to 65536 entries?
>> Back of the envelope estimates for 3.1M objects in linux.git suggests
>> a 2^16 fanout table would decrease the number of binary search
>> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
>> of disk. On a 70M idx file this is noise.
>> I'm starting to wonder if increasing the fanout table once the object
>> count is above a certain threshold is a reasonable optimization for
>> larger repositories.
> Yeah, and I do not think we have to be worried too much about
> backward compatibility for .idx files, as they are local and can be
> regenerated if an older version cannot read it.
> I also wonder if we can generate a finer-grained fan-out table on
> the fly, perhaps lazily, without changing the on-disk format ;-)
Heh. Yes. The reader at runtime could expand the 256 fanout table to a
65536 fanout table without changing the on disk format. Unfortunately
doing the expansion will require O(N) time as the 2nd byte of each
SHA-1 must be examined from every bucket. If you are going to perform
O(N) lookups this expansion might save time as it lowers the "log N"
bound of the O(N log N) algorithm for N lookups. It doesn't help
enough when doing only N/20 lookups.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html