Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
Ævar Arnfjörð Bjarmasonwrites: > But from what you're saying here that seems like a non-issue, i.e. in > such a scenario we'd just mirror the original repo[1], change the URL > in git.git to that, and then anyone could easily use older history > since it would be pointing to the new mirror. Those who cloned before such a switch will still have the URL they chose when they cloned and did "submodule init" on it, I think, so they need to explicitly choose to switch to the new URL. So I would not exactly say "seems like a non-issue"; it certainly is an easy thing to work around.
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
On Tue, May 23, 2017 at 3:06 PM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> Seems like it would be useful to have a way to ex-post-facto say "past >> history should use these URLs". i.e. if all git.git mirrors go down >> and we have to re-host, then you can just clone git.git and off you >> go, but the same isn't true of past submodule urls, or is it? > > I do not know how heavily you are used to use submodules, but I > think submodule's URL is copied to the config of the superproject, > and that URL is what will be used from there on, so "past history or > future history will use that URL" is already the case, no? I haven't used them much, just starting to get familiar with them now again. I thought given your "if it is ever rewound away in the upstream history..." that if we e.g. pegged upstream to that github URL now that if that got rewound, anyone working with git.git in the future would be in for some pain if they needed to check out and test old tags. But from what you're saying here that seems like a non-issue, i.e. in such a scenario we'd just mirror the original repo[1], change the URL in git.git to that, and then anyone could easily use older history since it would be pointing to the new mirror. I.e. in the spirit of my last reply, this seems like deviating from the default workflow around submodules out of concern for an extremely unlikely scenario, which, if it happened, would be easily mitigated for both past & future git.git history. 1. Or likely just ask upstream kindly to push a tag with the old history.
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
Ævar Arnfjörð Bjarmasonwrites: > Seems like it would be useful to have a way to ex-post-facto say "past > history should use these URLs". i.e. if all git.git mirrors go down > and we have to re-host, then you can just clone git.git and off you > go, but the same isn't true of past submodule urls, or is it? I do not know how heavily you are used to use submodules, but I think submodule's URL is copied to the config of the superproject, and that URL is what will be used from there on, so "past history or future history will use that URL" is already the case, no?
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
On Tue, May 23, 2017 at 12:27 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> I liked the suggestion to make the URL a relative path, but this would >> require you to maintain a mirror in the same places you push git.git >> to, is that something you'd be willing to do? > > After thinking about this a bit more, I know what I think we want a > bit better. > > Relative URL (e.g. ../sha1collisiondetection that sits next to the > copy of git.git) may be a good way to go. I can arrange to create > necessary repository next to git.git on k.org and github.com but I > need to double check about other places If the URL doesn't point to the canonical upstream how do we review a patch to update sha1dc here on list? Doesn't change from "'git am' this and it'll work" to "'git am' this and it'll fail, then do this submodule config modification dance, and run some other command". I haven't tried repointing a submodule temporarily (and locally) to another URL for such a use, how is that even done? > Whether the submodule is referenced by a relative URL from the main > project, the submodule should not come directly from the upstream, > and various mirrors that sit next to git.git should not be blind and > automated "mirrors". This is because I do not want us to trust the > security measures of https://github.com/cr-marcstevens/ repository. > The consumers already need to trust k.org/pub/scm/git/git.git and by > ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do > not have to trust anything extra. I had the same comments Stefan pointed out below about this. So I won't repeat any of that... > Another reason is that we want to make sure all commits in the > submodule that we bind to the superproject (i.e. git.git) are always > in the submodule, regardless of what our upstream does, and one way > to do so is to have control over _our_ canonical repository for the > submodule. In normal times, it will faithfully follow the upstream > without doing anything else, but we'd keep the option of anchoring a > submodule commit that is referenced by the superproject history with > our own tag, if it is ever rewound away in the upstream history for > whatever reason. If we were talking about any other project but git.git I'd say "yeah this makes sense". But I think in our case we should keep in mind the main point of this exercise is for us to dogfood submodule usage, not just so we get whatever trivial benefits from updating the sha1collision/ directory from upstream, but so that we run into issues with submodules that we solve for all our users. In this case you're basically concerned with: * We have N mirrors, but the upstream submodule URL is just one URL, so let's not point to that, but to our N mirrors Could also be addressed with a combination of 'pullUrl' for submodules (inverse of pushUrl for push) to list the canonical one & list of mirrors (or use the relative URL). * What if upstream say 5 years in the future rewinds their history, github shuts down or whatever, can we check out and work with older versions of git.git? Seems like it would be useful to have a way to ex-post-facto say "past history should use these URLs". i.e. if all git.git mirrors go down and we have to re-host, then you can just clone git.git and off you go, but the same isn't true of past submodule urls, or is it?
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
Stefan Bellerwrites: > On Mon, May 22, 2017 at 3:27 PM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> >>> I liked the suggestion to make the URL a relative path, but this would >>> require you to maintain a mirror in the same places you push git.git >>> to, is that something you'd be willing to do? >> >> After thinking about this a bit more, I know what I think we want a >> bit better. >> >> Relative URL (e.g. ../sha1collisiondetection that sits next to the >> copy of git.git) may be a good way to go. I can arrange to create >> necessary repository next to git.git on k.org and github.com but I >> need to double check about other places > > And here we see another deficit with a single URL: > We have to abide by the same scheme at all hosting endpoints. FWIW, I do not see it a deficit. It is a price you may or may not be willing to pay for simplicity, and I think it is a reasonable trade-off. The .gitmodules format can be enhanced to list multiple URLs quite easily. I think the current users all use the equivalent of "git config -f .gitmodules submodule.foo.url" to grab one value. Unless the user chooses to do anything special, they will continue to get the same behaviour whensuch an enhancement happens, which is a good thing. But then, you need to design what users choose to do that is "something special". Should "git clone --recurse-submodules" have a way to control which one of the not-yet-known-before-cloning URLs that may be listed in .gitmodules? Will we have a way to say "For North American users, we recommend this URL, while Asians may want to fetch from this other URL" in .gitmodules and then the recursive clone have a way to say "I want the European option"? Would the recursive clone have a way to go interactive? And from that point of view, "you'll find the submodules relative to the superproject" convention is one way (not necessarily the only way) to allow users not to care too much. The simplicity comes with price and that is perfectly acceptable. Also a single URL scheme may still perfectly fine. .gitmodules may have new submodule..alternateURL fields and recursive clone can be told to optionally go interactive when such fields are present. Or README can list alternate URLs and instruct the users to use the insteadOf if they want to go to mirrors instead. Those users who do care about picking particular mirror are likely not favor simplicity over flexibility, so they would not likely to do a recursive clone (after all, clone is a single-time operation) and it may be sufficient if they can clone the top-level, read README and then decide how and from where they get their submodules.
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
On Mon, May 22, 2017 at 3:27 PM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> I liked the suggestion to make the URL a relative path, but this would >> require you to maintain a mirror in the same places you push git.git >> to, is that something you'd be willing to do? > > After thinking about this a bit more, I know what I think we want a > bit better. > > Relative URL (e.g. ../sha1collisiondetection that sits next to the > copy of git.git) may be a good way to go. I can arrange to create > necessary repository next to git.git on k.org and github.com but I > need to double check about other places And here we see another deficit with a single URL: We have to abide by the same scheme at all hosting endpoints. For example consider the host https://kernel.googlesource.com/pub/scm/git/git that mirrors from kernel.org. It would be able to bind the submodule at https://kernel.googlesource.com/pub/scm/git/git/sha1dc i.e. it would look like a subdirectory of the main git repo. This is not an issue for our desired usecase, as all hosts can comply with the scheme that you outlined (url=../sha1...), but worth noting that in the long term we may want to have the ability to "configure" each remote individually by having out-of-history config options. I think we would want to solve that via a "refs/meta/gitmodules" branch that can be adapted per remote. (original idea from jrnieder@) > Whether the submodule is referenced by a relative URL from the main > project, the submodule should not come directly from the upstream, > and various mirrors that sit next to git.git should not be blind and > automated "mirrors". That sounds reasonable for our sanity. > This is because I do not want us to trust the > security measures of https://github.com/cr-marcstevens/ repository. > The consumers already need to trust k.org/pub/scm/git/git.git and by > ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do > not have to trust anything extra. The trust would be transitive, as the said submodule is referenced via sha1, so all malicious actions upstream could perform are: * denial of service: (by remove a commit that we pointed at in our history) * denial of service 2: add a huge blob to their repo, such that anyone obtaining the submodule not carefully is annoyed by a super large repo. * add additional malicious data (such as illegal numbers and algorithms) to a branch, which would be obtained by users cloning the submodule carelessly. > Another reason is that we want to make sure all commits in the > submodule that we bind to the superproject (i.e. git.git) are always > in the submodule, regardless of what our upstream does, and one way > to do so is to have control over _our_ canonical repository for the > submodule. By having all repos under one entity of trust, we would not need to discuss all kinds of possible attacks as above. > In normal times, it will faithfully follow the upstream > without doing anything else, but we'd keep the option of anchoring a > submodule commit that is referenced by the superproject history with > our own tag, if it is ever rewound away in the upstream history for > whatever reason. That makes sense. Thanks, Stefan
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
Ævar Arnfjörð Bjarmasonwrites: > I liked the suggestion to make the URL a relative path, but this would > require you to maintain a mirror in the same places you push git.git > to, is that something you'd be willing to do? After thinking about this a bit more, I know what I think we want a bit better. Relative URL (e.g. ../sha1collisiondetection that sits next to the copy of git.git) may be a good way to go. I can arrange to create necessary repository next to git.git on k.org and github.com but I need to double check about other places Whether the submodule is referenced by a relative URL from the main project, the submodule should not come directly from the upstream, and various mirrors that sit next to git.git should not be blind and automated "mirrors". This is because I do not want us to trust the security measures of https://github.com/cr-marcstevens/ repository. The consumers already need to trust k.org/pub/scm/git/git.git and by ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do not have to trust anything extra. Another reason is that we want to make sure all commits in the submodule that we bind to the superproject (i.e. git.git) are always in the submodule, regardless of what our upstream does, and one way to do so is to have control over _our_ canonical repository for the submodule. In normal times, it will faithfully follow the upstream without doing anything else, but we'd keep the option of anchoring a submodule commit that is referenced by the superproject history with our own tag, if it is ever rewound away in the upstream history for whatever reason. Thanks.
[PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
On Sat, May 20, 2017 at 1:13 PM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> Replace the forked sha1dc directory with a copy of the upstream code >> imported as a submodule. This is the exact same code as now exists in >> the sha1dc/ directory. >> >> The initial reason for copy/pasting the code into sha1dc and locally >> modifying it was that it needed to be altered to work with the git >> project. >> >> The upstream project has accepted my code changes to allow us to use >> their code as-is, see the preceding commit for details. So import the >> code as a submodule instead, this will make it easier to keep >> up-to-date with any upstream fixes or improvements. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> --- >> .gitmodules| 4 >> Makefile | 4 ++-- >> hash.h | 2 +- >> sha1collisiondetection | 1 + >> 4 files changed, 8 insertions(+), 3 deletions(-) >> create mode 100644 .gitmodules >> create mode 16 sha1collisiondetection > > I am not sure how prepared our .travis.yml is to deal with a > submodule, I'd prefer to have this step broken down to two step > process. > > That is, [PATCH 2.1/3] first adds an otherwise unused submodule, so > that people can optionally do "git submodule init && git submodule > update" so that they can compare the contents of sha1dc/ that has > been updated by [PATCH 1/3] with the up-to-date upstream. Then > [PATCH 2.2/3] would update the Makefile and hash.h to use the code > in the submodule. > > I actually would want to see us proceed even more cautiously---if > the latter-half, i.e. [PATCH 2.2/3], is arranged so that it uses the > new sha1collisiondetection/ only when the submodule is initialized > and populated, and otherwise it uses sha1dc/ as before, I would feel > a lot safer. I wouldn't be this paranoid if this "let's start using > submodule ourselves" were done to some optional corner (like compat/ > or contrib/ somewhere), but this is the default hash function. I do > want to have something like this to force us (and submodule folks) > to get any kinks out, but I do not want to see many people not even > be able to build while this new arrangement is eased in. Once > people are comfortable with the new arrangement to use code from > submodule, we can then take [PATCH 3/3] to remove the old sha1dc/ > directory and the migration will be complete. Makes sense to take it slow. Hopefully this addresses your comments. I dropped the 3rd patch to remove sha1dc/ and the 2nd patch adds sha1collisiondetection/ as submodule, but it's not used unless you specify DC_SHA1_SUBMODULE in addition to DC_SHA1. Both patches should be safe to include & not cause any disruption, but now those interested in making the submodule experience in git.git better can init/update & set DC_SHA1_SUBMODULE=Y to play with it. Note that both patches update to a newer version of the upstream code. I sent them another pull request with some cleanups, one of which is to ignore .depends in their .gitignore file. > I also am not very happy with .gitmodules pointing at a single point > of failure. It would be nice if you can arrange a couple of mirrors > and have a comment in .gitmodules file to tell folks that they can > use these alternates by insteadOf or some other mechanism. I liked the suggestion to make the URL a relative path, but this would require you to maintain a mirror in the same places you push git.git to, is that something you'd be willing to do? For now having no-mirror isn't a big issue with my new 2/2 since it'se something you have to opt-in to with a build flag, which I suspect only I/Brandon/Stefan & a few others will use. Ævar Arnfjörð Bjarmason (2): sha1dc: update from upstream sha1dc: optionally use sha1collisiondetection as a submodule .gitmodules| 4 +++ Makefile | 21 +++- hash.h | 4 +++ sha1collisiondetection | 1 + sha1dc/sha1.c | 91 +- sha1dc/sha1.h | 90 ++--- sha1dc/ubc_check.c | 13 ++-- sha1dc/ubc_check.h | 14 ++-- sha1dc_git.c | 24 + sha1dc_git.h | 19 +++ 10 files changed, 193 insertions(+), 88 deletions(-) create mode 100644 .gitmodules create mode 16 sha1collisiondetection create mode 100644 sha1dc_git.c create mode 100644 sha1dc_git.h -- 2.13.0.303.g4ebf302169