Re: reftable [v2]: new ref storage format

Shawn Pearce Sun, 23 Jul 2017 14:47:38 -0700

My apologies for not responding to this piece of feedback earlier.

On Wed, Jul 19, 2017 at 7:02 AM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
> On Tue, Jul 18 2017, Shawn Pearce jotted:
>> On Mon, Jul 17, 2017 at 12:51 PM, Junio C Hamano <gits...@pobox.com> wrote:
>>> Shawn Pearce <spea...@spearce.org> writes:
>>>> where `time_sec` is the update time in seconds since the epoch.  The
>>>> `reverse_int32` function inverses the value so lexographical ordering
>>>> the network byte order time sorts more recent records first:
>>>>
>>>>     reverse_int(int32 t) {
>>>>       return 0xffffffff - t;
>>>>     }
>>>
>>> Is 2038 an issue, or by that time we'd all be retired together with
>>> this file format and it won't be our problem?
>>
>> Based on discussion with Michael Haggerty, this is now an 8 byte field
>> storing microseconds since the epoch. We should be good through year
>> 9999.
>
> I think this should be s/microseconds/nanoseconds/, not because there's
> some great need to get better resolution than nanoseconds, but because:
>
>  a) We already have WIP code (bp/fsmonitor) that's storing 64 bit
>     nanoseconds since the epoch, albeit for the index, not for refs.
>
>  b) There are several filesystems that have nanosecond resolution now,
>     and it's likely more will start using that.


The time in a reflog and the time returned by lstat(2) to detect dirty
files in the working tree are unrelated. Of course we want the
dircache to be reflecting the highest precision available from lstat,
to reduce the number of files that must be content hashed for racily
clean detection. So if a filesystem is using nanoseconds, dircache
maybe should support it.

> Thus:
>
>  x) If you use such a filesystem you'll lose time resolution with this
>     ref backend v.s. storing them on disk, which isn't itself a big
>     deal, but more importantly you lose 1=1 time mapping as you
>     transition and convert between the two.

No, you won't. The reflog today ($GIT_DIR/logs) is storing second
precision in the log record. What precision the filesystem is using as
an mtime is irrelevant.

Further, microsecond is sufficient resolution for reflog data. From my
benchmarking just reading a reference from a very hot reftable costs
~20.2 usec. Any update of a reference requires a read-compare-modify
cycle, and so updates aren't going to be more frequent than 20 usec.

>  y) Our own code will need to juggle second resolution epochs
>     (traditional FSs, any 32bit epoch format), microseconds (this
>     proposal), and nanoseconds (new FSs, bp/fsmonitor) internally in
>     various places.

But these are also unrelated areas. IMHO, the nanosecond stuff should
be confined to the dircache management code and working tree
comparison code, and not be leaking out of there. Commit objects are
still recorded with second precision, and that isn't going to change.

Therefore I decided to stick with microseconds in the reftable v3
draft that I posted on July 22nd.

Re: reftable [v2]: new ref storage format

Reply via email to