On Tue, Sep 3, 2013 at 1:09 PM, Nicolas Pitre <n...@fluxnic.net> wrote:
> On Sat, 31 Aug 2013, Duy Nguyen wrote:
>> On Tue, Aug 27, 2013 at 11:26 AM, Nicolas Pitre <n...@fluxnic.net> wrote:
>> > A bit crud but good enough for now.
>> I wonder if we should keep a short SHA-1 table in .idx. An entry in
>> the original .idx v1 table is <SHA-1>+<offset> then offset moved out
>> to make the table more compact for binary search. I observe that we
>> don't always need 20 byte SHA-1 to uniquely identify an entry in a
>> pack, so the SHA-1 table could be split in two: one table contain the
>> first part of SHA-1, long enough to identify any entry in the pack;
>> the second table contains either full SHA-1 or the remaining part.
>> Binary search is done on the first table, if matched, full sha-1 from
>> the second table is compared. We already have the second sha-1 table
>> in .pack v4. We could add the first table in .idx v3.
>> On linux-2.6 even in one-pack configuration, we only need the first 6
>> bytes of sha-1 to identify an object. That would cut the bsearch sha-1
>> table to 1/4 full sha-1 table size.
> I don't see the point though.
> Why having two tables when only one suffice?
> If the argument is about making the SHA1 binary search more efficient
> given a smaller memory footprint, that comes with extra complexity
> coming from the variable length SHA1 strings in the second table. So
> I'm not sure there is much to gain.
> Furthermore, one of the design of pack v4 is to avoid the SHA1 binary
> search entirely. You will need the binary search only once to locate
> the top commit object, but from there the entire object tree can be
> walked simply by using the sha1ref index and looking up the
> corresponding offset into the pack index file to locate the next object.
> Same thing for walking tree objects: the pack v4 tree deltas are meant
> to be walked inline without having to _apply_ any delta.
Hmm.. you are right. We avoid a lot of SHA-1 lookups with .pack v4. A
more compact SHA-1 table only benefits random SHA-1 lookups, which
should be much fewer compared to "rev-list --objects --all".
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html