On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> [...]
> You earlier (re)discovered a good approach to introduce a new
> feature without breaking settings of existing users when we
> discussed a "whitelist".  Since setting the configuration to an
> empty string did not do anything in the old code, an empty string
> was an invalid and non-working setting.  By taking advantage of that
> fact, you safely can say "if you start with an empty that would
> match everything, we'll treat all the others differently from the
> way we did before" if you wanted to.  I think you can follow the
> same principle here.  For example, I can imagine that the rule for
> the "ref-is-hidden" can be updated to:
> 
>  * it now takes refname and also the fullname before stripping the
>    namespace;
> 
>  * hide patterns that is prefixed with '!' means negative, just as
>    before;
> 
>  * (after possibly '!' is stripped), hide patterns that is prefixed
>    with '^', which was invalid before, means check the fullname with
>    namespace prefix, which is a new rule;
> 
>  * otherwise, check the refname after stripping the namespace.
> 
> Such an update would allow a new feature "we now allow you to write
> a pattern that determines the match before stripping the namespace
> prefix" without breaking the existing repositories, no?
> 

Yes. If I understood you correctly, this is exactly what I suggested in
the last paragraph of my previous email (the only difference being that
I suggested to use "/" as full name indicator instead of "^" but that is
just an implementation detail). I will look into implementing this if
that is the way we want to go.

> [...]
> Assuming other namespaces are forks of the same project as yours
> (and otherwise the repository management strategy needs to be
> rethought--using namespace for them is not gaining anything other
> than making your repack more costly), it is likely that all of them
> share a lot of refs that point at the same object (think "tags").
> Do we end up sending a lot of ".have" for exactly the same object
> number of times?  Even though we cannot dedup show_ref() lines that
> talk about concrete refs (because they talk about what refs exist at
> which value, and the sending side would use them to locally reject
> non-ff pushes, for example), ".have" lines that talk about the same
> object can be safely deduped.  This is not directly related to your
> topic of "what should be included in the advertisement", but a
> potentially good thing to fix, if it indeed turns out that we are
> sending a lot of duplicate ".have"s.  A fix in that would make
> things better for everybody (not just namespace users, but those who
> show the ".have" lines from the refs in the repository we borrow
> objects from).

Yes, I think we currently send a lot of duplicate lines. Would be nice
to have that fixed as well.

Note that we do use Git namespaces to store a lot of different but
similar pseudo repositories (i.e. they do not share any history but the
objects have huge similarities). Even though the pseudo repositories
itself are tiny, having the objects in a shared object storage reduces
the size significantly. Other people probably use separate repositories,
combined with something like GIT_OBJECT_DIRECTORY and preciousObjects
for that. Using Git namespaces, however, allows to run `git gc`/`git
repack` without needing to take care of maintaining back references to
the pseudo repositories and, more importantly, allows for storing all
the refs in a single "packed-refs" file which did reduce the size the
size by another factor of 10 in our tests. That massive difference in
size is probably mostly due to the fact that the actual content of each
repository is just some 100 bytes. Not sure if saving that much space
can currently be achieved with any other approach.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to