Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-25 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> But from what you're saying here that seems like a non-issue, i.e. in
> such a scenario we'd just mirror the original repo[1], change the URL
> in git.git to that, and then anyone could easily use older history
> since it would be pointing to the new mirror.

Those who cloned before such a switch will still have the URL they
chose when they cloned and did "submodule init" on it, I think, so
they need to explicitly choose to switch to the new URL.  So I would
not exactly say "seems like a non-issue"; it certainly is an easy
thing to work around.


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-25 Thread Ævar Arnfjörð Bjarmason
On Tue, May 23, 2017 at 3:06 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>> Seems like it would be useful to have a way to ex-post-facto say "past
>> history should use these URLs". i.e. if all git.git mirrors go down
>> and we have to re-host, then you can just clone git.git and off you
>> go, but the same isn't true of past submodule urls, or is it?
>
> I do not know how heavily you are used to use submodules, but I
> think submodule's URL is copied to the config of the superproject,
> and that URL is what will be used from there on, so "past history or
> future history will use that URL" is already the case, no?

I haven't used them much, just starting to get familiar with them now again.

I thought given your "if it is ever rewound away in the upstream
history..." that if we e.g. pegged upstream to that github URL now
that if that got rewound, anyone working with git.git in the future
would be in for some pain if they needed to check out and test old
tags.

But from what you're saying here that seems like a non-issue, i.e. in
such a scenario we'd just mirror the original repo[1], change the URL
in git.git to that, and then anyone could easily use older history
since it would be pointing to the new mirror.

I.e. in the spirit of my last reply, this seems like deviating from
the default workflow around submodules out of concern for an extremely
unlikely scenario, which, if it happened, would be easily mitigated
for both past & future git.git history.

1. Or likely just ask upstream kindly to push a tag with the old history.


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-23 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Seems like it would be useful to have a way to ex-post-facto say "past
> history should use these URLs". i.e. if all git.git mirrors go down
> and we have to re-host, then you can just clone git.git and off you
> go, but the same isn't true of past submodule urls, or is it?

I do not know how heavily you are used to use submodules, but I
think submodule's URL is copied to the config of the superproject,
and that URL is what will be used from there on, so "past history or
future history will use that URL" is already the case, no?


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-23 Thread Ævar Arnfjörð Bjarmason
On Tue, May 23, 2017 at 12:27 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> I liked the suggestion to make the URL a relative path, but this would
>> require you to maintain a mirror in the same places you push git.git
>> to, is that something you'd be willing to do?
>
> After thinking about this a bit more, I know what I think we want a
> bit better.
>
> Relative URL (e.g. ../sha1collisiondetection that sits next to the
> copy of git.git) may be a good way to go.  I can arrange to create
> necessary repository next to git.git on k.org and github.com but I
> need to double check about other places

If the URL doesn't point to the canonical upstream how do we review a
patch to update sha1dc here on list? Doesn't change from "'git am'
this and it'll work" to "'git am' this and it'll fail, then do this
submodule config modification dance, and run some other command".

I haven't tried repointing a submodule temporarily (and locally) to
another URL for such a use, how is that even done?

> Whether the submodule is referenced by a relative URL from the main
> project, the submodule should not come directly from the upstream,
> and various mirrors that sit next to git.git should not be blind and
> automated "mirrors".  This is because I do not want us to trust the
> security measures of https://github.com/cr-marcstevens/ repository.
> The consumers already need to trust k.org/pub/scm/git/git.git and by
> ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do
> not have to trust anything extra.

I had the same comments Stefan pointed out below about this. So I
won't repeat any of that...

> Another reason is that we want to make sure all commits in the
> submodule that we bind to the superproject (i.e. git.git) are always
> in the submodule, regardless of what our upstream does, and one way
> to do so is to have control over _our_ canonical repository for the
> submodule.  In normal times, it will faithfully follow the upstream
> without doing anything else, but we'd keep the option of anchoring a
> submodule commit that is referenced by the superproject history with
> our own tag, if it is ever rewound away in the upstream history for
> whatever reason.

If we were talking about any other project but git.git I'd say "yeah
this makes sense".

But I think in our case we should keep in mind the main point of this
exercise is for us to dogfood submodule usage, not just so we get
whatever trivial benefits from updating the sha1collision/ directory
from upstream, but so that we run into issues with submodules that we
solve for all our users.

In this case you're basically concerned with:

 * We have N mirrors, but the upstream submodule URL is just one URL,
so let's not point to that, but to our N mirrors

Could also be addressed with a combination of 'pullUrl' for submodules
(inverse of pushUrl for push) to list the canonical one & list of
mirrors (or use the relative URL).

 * What if upstream say 5 years in the future rewinds their history,
github shuts down or whatever, can we check out and work with older
versions of git.git?

Seems like it would be useful to have a way to ex-post-facto say "past
history should use these URLs". i.e. if all git.git mirrors go down
and we have to re-host, then you can just clone git.git and off you
go, but the same isn't true of past submodule urls, or is it?


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-22 Thread Junio C Hamano
Stefan Beller  writes:

> On Mon, May 22, 2017 at 3:27 PM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason   writes:
>>
>>> I liked the suggestion to make the URL a relative path, but this would
>>> require you to maintain a mirror in the same places you push git.git
>>> to, is that something you'd be willing to do?
>>
>> After thinking about this a bit more, I know what I think we want a
>> bit better.
>>
>> Relative URL (e.g. ../sha1collisiondetection that sits next to the
>> copy of git.git) may be a good way to go.  I can arrange to create
>> necessary repository next to git.git on k.org and github.com but I
>> need to double check about other places
>
> And here we see another deficit with a single URL:
> We have to abide by the same scheme at all hosting endpoints.

FWIW, I do not see it a deficit.  It is a price you may or may not
be willing to pay for simplicity, and I think it is a reasonable
trade-off.

The .gitmodules format can be enhanced to list multiple URLs quite
easily.  I think the current users all use the equivalent of "git
config -f .gitmodules submodule.foo.url" to grab one value.  Unless
the user chooses to do anything special, they will continue to get
the same behaviour whensuch an enhancement happens, which is a good
thing.

But then, you need to design what users choose to do that is
"something special".  Should "git clone --recurse-submodules" have a
way to control which one of the not-yet-known-before-cloning URLs
that may be listed in .gitmodules?  Will we have a way to say "For
North American users, we recommend this URL, while Asians may want
to fetch from this other URL" in .gitmodules and then the recursive
clone have a way to say "I want the European option"?  Would the
recursive clone have a way to go interactive?

And from that point of view, "you'll find the submodules relative to
the superproject" convention is one way (not necessarily the only
way) to allow users not to care too much.  The simplicity comes with
price and that is perfectly acceptable.

Also a single URL scheme may still perfectly fine.  .gitmodules may
have new submodule..alternateURL fields and recursive clone
can be told to optionally go interactive when such fields are
present.

Or README can list alternate URLs and instruct the users to use the
insteadOf if they want to go to mirrors instead.  Those users who do
care about picking particular mirror are likely not favor simplicity
over flexibility, so they would not likely to do a recursive clone
(after all, clone is a single-time operation) and it may be
sufficient if they can clone the top-level, read README and then
decide how and from where they get their submodules.


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-22 Thread Stefan Beller
On Mon, May 22, 2017 at 3:27 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> I liked the suggestion to make the URL a relative path, but this would
>> require you to maintain a mirror in the same places you push git.git
>> to, is that something you'd be willing to do?
>
> After thinking about this a bit more, I know what I think we want a
> bit better.
>
> Relative URL (e.g. ../sha1collisiondetection that sits next to the
> copy of git.git) may be a good way to go.  I can arrange to create
> necessary repository next to git.git on k.org and github.com but I
> need to double check about other places

And here we see another deficit with a single URL:
We have to abide by the same scheme at all hosting endpoints.

For example consider the host https://kernel.googlesource.com/pub/scm/git/git
that mirrors from kernel.org. It would be able to bind the
submodule at  https://kernel.googlesource.com/pub/scm/git/git/sha1dc
i.e. it would look like a subdirectory of the main git repo.

This is not an issue for our desired usecase, as all hosts can comply
with the scheme that you outlined (url=../sha1...), but worth noting that
in the long term we may want to have the ability to "configure" each
remote individually by having out-of-history config options. I think we
would want to solve that via a "refs/meta/gitmodules" branch that can be
adapted per remote. (original idea from jrnieder@)

> Whether the submodule is referenced by a relative URL from the main
> project, the submodule should not come directly from the upstream,
> and various mirrors that sit next to git.git should not be blind and
> automated "mirrors".

That sounds reasonable for our sanity.

> This is because I do not want us to trust the
> security measures of https://github.com/cr-marcstevens/ repository.
> The consumers already need to trust k.org/pub/scm/git/git.git and by
> ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do
> not have to trust anything extra.

The trust would be transitive, as the said submodule is referenced via
sha1, so all malicious actions upstream could perform are:
* denial of service: (by remove a commit that we pointed at in our history)
* denial of service 2: add a huge blob to their repo, such that anyone
  obtaining the submodule not carefully is annoyed by a super large repo.
* add additional malicious data (such as illegal numbers and algorithms)
  to a branch, which would be obtained by users cloning the submodule
  carelessly.

> Another reason is that we want to make sure all commits in the
> submodule that we bind to the superproject (i.e. git.git) are always
> in the submodule, regardless of what our upstream does, and one way
> to do so is to have control over _our_ canonical repository for the
> submodule.

By having all repos under one entity of trust, we would not need to discuss
all kinds of possible attacks as above.

>  In normal times, it will faithfully follow the upstream
> without doing anything else, but we'd keep the option of anchoring a
> submodule commit that is referenced by the superproject history with
> our own tag, if it is ever rewound away in the upstream history for
> whatever reason.

That makes sense.

Thanks,
Stefan


Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-22 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> I liked the suggestion to make the URL a relative path, but this would
> require you to maintain a mirror in the same places you push git.git
> to, is that something you'd be willing to do?

After thinking about this a bit more, I know what I think we want a
bit better.

Relative URL (e.g. ../sha1collisiondetection that sits next to the
copy of git.git) may be a good way to go.  I can arrange to create
necessary repository next to git.git on k.org and github.com but I
need to double check about other places

Whether the submodule is referenced by a relative URL from the main
project, the submodule should not come directly from the upstream,
and various mirrors that sit next to git.git should not be blind and
automated "mirrors".  This is because I do not want us to trust the
security measures of https://github.com/cr-marcstevens/ repository.
The consumers already need to trust k.org/pub/scm/git/git.git and by
ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do
not have to trust anything extra.

Another reason is that we want to make sure all commits in the
submodule that we bind to the superproject (i.e. git.git) are always
in the submodule, regardless of what our upstream does, and one way
to do so is to have control over _our_ canonical repository for the
submodule.  In normal times, it will faithfully follow the upstream
without doing anything else, but we'd keep the option of anchoring a
submodule commit that is referenced by the superproject history with
our own tag, if it is ever rewound away in the upstream history for
whatever reason.

Thanks.


[PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-20 Thread Ævar Arnfjörð Bjarmason
On Sat, May 20, 2017 at 1:13 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Replace the forked sha1dc directory with a copy of the upstream code
>> imported as a submodule. This is the exact same code as now exists in
>> the sha1dc/ directory.
>>
>> The initial reason for copy/pasting the code into sha1dc and locally
>> modifying it was that it needed to be altered to work with the git
>> project.
>>
>> The upstream project has accepted my code changes to allow us to use
>> their code as-is, see the preceding commit for details. So import the
>> code as a submodule instead, this will make it easier to keep
>> up-to-date with any upstream fixes or improvements.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  .gitmodules| 4 
>>  Makefile   | 4 ++--
>>  hash.h | 2 +-
>>  sha1collisiondetection | 1 +
>>  4 files changed, 8 insertions(+), 3 deletions(-)
>>  create mode 100644 .gitmodules
>>  create mode 16 sha1collisiondetection
>
> I am not sure how prepared our .travis.yml is to deal with a
> submodule, I'd prefer to have this step broken down to two step
> process.
>
> That is, [PATCH 2.1/3] first adds an otherwise unused submodule, so
> that people can optionally do "git submodule init && git submodule
> update" so that they can compare the contents of sha1dc/ that has
> been updated by [PATCH 1/3] with the up-to-date upstream.  Then
> [PATCH 2.2/3] would update the Makefile and hash.h to use the code
> in the submodule.
>
> I actually would want to see us proceed even more cautiously---if
> the latter-half, i.e. [PATCH 2.2/3], is arranged so that it uses the
> new sha1collisiondetection/ only when the submodule is initialized
> and populated, and otherwise it uses sha1dc/ as before, I would feel
> a lot safer.  I wouldn't be this paranoid if this "let's start using
> submodule ourselves" were done to some optional corner (like compat/
> or contrib/ somewhere), but this is the default hash function.  I do
> want to have something like this to force us (and submodule folks)
> to get any kinks out, but I do not want to see many people not even
> be able to build while this new arrangement is eased in.  Once
> people are comfortable with the new arrangement to use code from
> submodule, we can then take [PATCH 3/3] to remove the old sha1dc/
> directory and the migration will be complete.

Makes sense to take it slow. Hopefully this addresses your comments. I
dropped the 3rd patch to remove sha1dc/ and the 2nd patch adds
sha1collisiondetection/ as submodule, but it's not used unless you
specify DC_SHA1_SUBMODULE in addition to DC_SHA1.

Both patches should be safe to include & not cause any disruption, but
now those interested in making the submodule experience in git.git
better can init/update & set DC_SHA1_SUBMODULE=Y to play with it.

Note that both patches update to a newer version of the upstream
code. I sent them another pull request with some cleanups, one of
which is to ignore .depends in their .gitignore file.

> I also am not very happy with .gitmodules pointing at a single point
> of failure.  It would be nice if you can arrange a couple of mirrors
> and have a comment in .gitmodules file to tell folks that they can
> use these alternates by insteadOf or some other mechanism.

I liked the suggestion to make the URL a relative path, but this would
require you to maintain a mirror in the same places you push git.git
to, is that something you'd be willing to do?

For now having no-mirror isn't a big issue with my new 2/2 since it'se
something you have to opt-in to with a build flag, which I suspect
only I/Brandon/Stefan & a few others will use.

Ævar Arnfjörð Bjarmason (2):
  sha1dc: update from upstream
  sha1dc: optionally use sha1collisiondetection as a submodule

 .gitmodules|  4 +++
 Makefile   | 21 +++-
 hash.h |  4 +++
 sha1collisiondetection |  1 +
 sha1dc/sha1.c  | 91 +-
 sha1dc/sha1.h  | 90 ++---
 sha1dc/ubc_check.c | 13 ++--
 sha1dc/ubc_check.h | 14 ++--
 sha1dc_git.c   | 24 +
 sha1dc_git.h   | 19 +++
 10 files changed, 193 insertions(+), 88 deletions(-)
 create mode 100644 .gitmodules
 create mode 16 sha1collisiondetection
 create mode 100644 sha1dc_git.c
 create mode 100644 sha1dc_git.h

-- 
2.13.0.303.g4ebf302169