On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote: > > I'm thinking we need the ability to make it a real clonable repository -- > > perhaps without its own xapian index? Actual git repositories aren't large, > > especially if they are only used for direct git operations. Disk space is > > cheap, it's the IO that's expensive. :) > > True, though cache overheads hurt a bit. I also wonder if lei > can increase traffic to public-inbox-<imapd|nntpd> to reduce > the need/use of "git clone". > > > If these are real clonable repositories, then it would be easy for people to > > set up replication for just the curated content people want. > > Understood. Using --output v2publicinbox:... w/o --shared is > totally doable.
I'm just worried that if we overuse the alternates, then we may find ourselves in a situation where when we repack the "every blob" shared repository, we'll end up with a pack that isn't really optimized to be used by any of the member repos. So, in a situation where a clone is performed, git-upload-pack will have to spend a lot of cycles navigating through the monstrous parent pack just to build and re-compress the small subset of objects it needs to send. Git has ways of dealing with this by allowing to set things like pack islands, but it's finicky and requires that each child repo is defined as refs in the parent repo. We deal with this in grokmirror, but it's messy and requires properly tracking child repo additions/removals/etc. I think it may be one of those cases where wasting disk space on duplicate objects is worth the CPU cycle savings. > > Not really worried about deduping blobs, but I'm wondering how to make it > > work > > well when search parameters change (see above). E.g.: > > > > 1. we create the repo with one set of parameters > > 2. maintainer then broadens it up to include something else > > 3. maintainer then decides that it's now *way* too much and narrows it down > > again > > > > We don't really want step 2 to lead to a permanent ballooning of the > > repository, so perhaps all query changes should force-append a dt: with the > > open-ended datetime of the change? Or do you already have a way to deal with > > this situation? > > The aforementioned maxuid prevents stuff that's too old from > being seen. Otherwise, there's always "public-inbox-learn rm". How would it handle the situation where we import a new list into lore with a 10-year-long archive of messages? -K
