Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
On Sun, Jul 23, 2017 at 3:56 PM, Shawn Pearce wrote: > On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty > wrote: >> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty

Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
+git@vger.kernel.org. I originally sent the below reply privately by mistake. On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty wrote: > On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty

Re: reftable: new ref storage format

2017-07-18 Thread Junio C Hamano
Michael Haggerty writes: > On second thought, the idea of having HEAD (or maybe all pseudorefs) > in the same system would open a few interesting possibilities that > derive from having a global, atomic view of all references: > > 1. We could store backlinks from references

Re: reftable: new ref storage format

2017-07-17 Thread Michael Haggerty
On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: > On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty > wrote: >> * Do you propose to store *all* references (i.e., including the >> references that we call pseudorefs, like `HEAD`, `FETCH_HEAD`,

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 2:13 PM, Dave Borowitz wrote: > On Sun, Jul 16, 2017 at 3:43 PM, Shawn Pearce wrote: >> True... but... in my "android" example repository we have 866,456 live >> refs. A block size of 64k needs only 443 blocks, and a 12k index,

Re: reftable: new ref storage format

2017-07-16 Thread Dave Borowitz
On Sun, Jul 16, 2017 at 3:43 PM, Shawn Pearce wrote: > True... but... in my "android" example repository we have 866,456 live > refs. A block size of 64k needs only 443 blocks, and a 12k index, to > get the file to compress to 28M (vs. 62M packed-refs). > > Index records are

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: > On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty > wrote: > >> * The tuning parameter number_of_restarts currently trades off space >> (for the full refnames and the restart_offsets) against the

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty wrote: > Thanks for your reftable proposal. Thanks for your time reading the proposal. I really was looking forward to your insights on this project. > It would solve a lot of scalability > problems that we currently have,

Re: reftable: new ref storage format

2017-07-16 Thread Michael Haggerty
Thanks for your reftable proposal. It would solve a lot of scalability problems that we currently have, and do it in a way that is implementable in both C and Java, which is very nice. There are two mostly orthogonal components to your proposal: 1. What does a single reftable file look like? 2.

Re: reftable: new ref storage format

2017-07-16 Thread Johannes Sixt
Am 16.07.2017 um 12:03 schrieb Jeff King: On Sun, Jul 16, 2017 at 10:07:57AM +0200, Johannes Sixt wrote: Am 14.07.2017 um 22:08 schrieb Jeff King: The implementation on this doesn't seem overly complex. My main concerns are what we're asking from the filesystem in terms of atomicity, and what

Re: reftable: new ref storage format

2017-07-16 Thread Jeff King
On Sun, Jul 16, 2017 at 10:07:57AM +0200, Johannes Sixt wrote: > Am 14.07.2017 um 22:08 schrieb Jeff King: > > The implementation on this doesn't seem overly complex. My main concerns > > are what we're asking from the filesystem in terms of atomicity, and > > what possible races there are. > >

Re: reftable: new ref storage format

2017-07-16 Thread Jeff King
On Sat, Jul 15, 2017 at 11:01:47PM -0700, Shawn Pearce wrote: > > How deep would you anticipate stacks getting? Would it be feasible for > > the tip to contain the names of the tables in the entire chain? If we're > > talking about 20 (or even 32) bytes per name, you could still fit over a > >

Re: reftable: new ref storage format

2017-07-16 Thread Johannes Sixt
Am 14.07.2017 um 22:08 schrieb Jeff King: The implementation on this doesn't seem overly complex. My main concerns are what we're asking from the filesystem in terms of atomicity, and what possible races there are. One of the failure modes is that on Windows a file cannot be deleted while it

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Fri, Jul 14, 2017 at 1:08 PM, Jeff King wrote: > On Thu, Jul 13, 2017 at 05:11:52PM -0700, Shawn Pearce wrote: > > I think the "stack" implementation is what makes me most uncomfortable > with this proposal. Atomicity with filesystem operations and especially > readdir() is one

Re: reftable: new ref storage format

2017-07-14 Thread Jeff King
On Thu, Jul 13, 2017 at 05:27:44PM -0700, Shawn Pearce wrote: > > We _could_ consider gzipping individual blocks of > > a reftable (or any structure that allows you to search to a > > constant-sized block and do a linear search from there). But given that > > they're in the same ballpark, I'm

Re: reftable: new ref storage format

2017-07-14 Thread Jeff King
On Thu, Jul 13, 2017 at 05:11:52PM -0700, Shawn Pearce wrote: > Yes, I was hoping for reader atomicity. But I may OK foregoing that if > the transaction either all goes through, or all fails. A partially > stuck transaction because the process died in the middle of the commit > step creates a

Re: reftable: new ref storage format

2017-07-14 Thread Shawn Pearce
On Fri, Jul 14, 2017 at 7:27 AM, Dave Borowitz wrote: > On Thu, Jul 13, 2017 at 8:11 PM, Shawn Pearce wrote: >> In another (Gerrit Code Review), we disable reflogs for >> the insane refs/changes/ namespace, as nearly every reference is >> created once,

Re: reftable: new ref storage format

2017-07-14 Thread Dave Borowitz
On Thu, Jul 13, 2017 at 8:11 PM, Shawn Pearce wrote: > In another (Gerrit Code Review), we disable reflogs for > the insane refs/changes/ namespace, as nearly every reference is > created once, and never modified. Apologies for the tangent, but this is not true in the most

Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 1:35 PM, Jeff King wrote: > On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there

Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King wrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> ### Problem statement >> >> Some repositories contain a lot of references (e.g. android at 866k, >> rails at 31k). The existing packed-refs format takes up a

Re: reftable: new ref storage format

2017-07-13 Thread Eric Wong
Jeff King wrote: > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there are ambiguities when you land in the middle of a record, > since there's no unambiguous

Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > >> ### Problem statement > >> > >> Some repositories contain a lot of references (e.g. android at 866k, > >> rails at 31k). The existing packed-refs format takes up a lot of > >> space (e.g. 62M), and does not scale with

Re: reftable: new ref storage format

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King wrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> We've been having scaling problems with insane number of references >> (>866k), so I started thinking a lot about improving ref storage. >> >> I've written a

Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > We've been having scaling problems with insane number of references > (>866k), so I started thinking a lot about improving ref storage. > > I've written a simple approach, and implemented it in JGit. > Performance is promising: > >

reftable: new ref storage format

2017-07-12 Thread Shawn Pearce
We've been having scaling problems with insane number of references (>866k), so I started thinking a lot about improving ref storage. I've written a simple approach, and implemented it in JGit. Performance is promising: - 62M packed-refs compresses to 27M - 42.3 usec lookup You can read a