On Fri, Sep 29, 2017 at 11:43:57PM +0200, Johannes Schindelin wrote:

> On Thu, 28 Sep 2017, Jeff King wrote:
> 
> > If you're planning on using an oidset to mark every object in a
> > 100-million-object monorepo, we'd probably care more. But I'd venture to
> > say that any scheme which involves generating that hash table on the fly
> > is doing it wrong. At at that scale we'd want to look at compact
> > mmap-able on-disk representations.
> 
> Or maybe you would look at a *not-so-compact* mmap()able on-disk
> representation, to allow for painless updates.
> 
> You really will want to avoid having to write out large files just because
> a small part of them changed. We learn that lesson the hard way, from
> having to write 350MB worth of .git/index for every single, painful `git
> add` operation.

Sure. I didn't mean to start designing the format. I just mean that if
the first step of the process is "read information about all 100 million
objects into an in-RAM hashmap", then that is definitely not going to
fly.

-Peff

Reply via email to