[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-08 Thread Martin Vaeth
Rich Freeman  wrote:
> emerge --sync works just fine if
> there are uncommitted changes in your repository, whether they are
> indexed or otherwise.

You are right. It seems to be somewhat "random" when git pull
refuses to work and when not. I could not detect a common scheme.
Maybe this has mainly to do with using overlayfs and git becoming
confused.




Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-08 Thread Rich Freeman
On Sun, Jul 8, 2018 at 4:28 AM Martin Vaeth  wrote:
>
> Rich Freeman  wrote:
>
> It's the *history* of the metadata which matters here:

You make a reasonable point here.

> > "The council does not require that ChangeLogs be generated or
> >   distributed through the rsync system. It is at the discretion of our
> >   infrastructure team whether or not this service continues."
>
> The formulation already makes it clear that one did not want to
> put pressure on infra, and at that time it was expected that
> every user would switch to git anyway.

The use of git for history, and yes, in general the Council tries not
to forbid projects from providing services.  The intent was to
communicate that it was simply not an expectation that they do so.

> At that time also the gkeys project was very active, and git was
> (besides webrsync) the only expected way to get checksums for the
> full tree. In particular, rsync was inherently insecure.

Honestly, I don't think gkeys really played any part in this, but
there was definitely an intent for signature checking in the tree to
become more robust.  As you point out (in a part I trimmed) it ought
to be possible to do this.  Indeed, git support for signing commits
was considered a requirement for git implementation.

> >> 4. Even if the user made the mistake to edit a file, portage should
> >>not just die on syncing.
> >
> > emerge --sync won't die in a situation like in general.
>
> It does: git push refuses to start if there are uncommitted changes.
>

I did a test before I made my post.  emerge --sync works just fine if
there are uncommitted changes in your repository, whether they are
indexed or otherwise.  I didn't test merge conflicts but I'd hope it
would fail if these exist.

-- 
Rich



[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-08 Thread Martin Vaeth
Rich Freeman  wrote:
>> I was speaking about gentoo's git repository, of course
>> (the one which was attacked on github), not about a Frankensteined one
>> with metadata history filling megabytes of disk space unnecessarily.
>> Who has that much disk space to waste?
>
> Doesn't portage create that metadata anyway when you run it

You should better have it created by egencache in portage-postsyncd;
and even more you should download some other repositories as well
(news announcements, GLSA, dtd, xml-schema) which are maintained
independently, see e.g.
https://github.com/vaeth/portage-postsyncd-mv

It is the Gentoo way: Download only the sources and build it from there.
That's also a question of mentality and why I think most gentoo users
who use git would prefer that way.

> negating any space savings at the cost of CPU to regenerate the cache?

It's the *history* of the metadata which matters here:
Since every changed metadata file requires a fraction of a second,
one can estimate rather well that several ten thousand files are
changed hourly/daily/weekly (the frequency depending mainly on eclass
changes: One change in some eclass requires a change for practically
every version of every package) so that the history of metadata changed
produced by this over time is enormous. This history, of course,
is completely useless and stored completely in vain.
One of the weaknesses of git is that it is impossible, by design,
to omit such superfluous history selectively (once the files *are*
maintained by git).

>> For the official git repository your assertions are simply false,
>> as you apprently admit: It is currently not possible to use the
>> official git repo (or the github clone of it which was attacked)
>> in a secure manner.
>
> Sure, but this also doesn't support signature verification at all
> [...] so your points still don't apply.

Hu? This actually *was* my point.

BTW, portage might easily support signature verification if just
distribution of the developers' public keys would be properly
maintained (e.g. via gkeys or simpler via some package):
After all, gentoo infra should always have an up-to-date list of
these keys anyway.
(If they don't, it would make it even more important to use the
source repo instead of trusting a signature which is given
without sufficient verification)

>> Your implicit claim is untrue. rsync - as used by portage - always
>> transfers whole files, only.
>
> rsync is capable of transferring partial files.

Yes, and portage is explicitly disabling this. (It costs a lot of
server CPU time and does not save much transfer data if the files
are small, because a lot of hashes have to be transferred
(and calculated - CPU-time!) instead.)

> However, this is based on offsets from the start of the file

There are new algorithms which support also detection of insertions
and deletions via rolling hashes (e.g. for deduplicating filesystems).
Rsync is using quite an advanced algorithm as well, but I would
need to recheck its features.

Anyway, it plays no role for our discussion, because for such
small files it hardly matters, and portage is disabling
said algorithm anyway.

> "The council does not require that ChangeLogs be generated or
>   distributed through the rsync system. It is at the discretion of our
>   infrastructure team whether or not this service continues."

The formulation already makes it clear that one did not want to
put pressure on infra, and at that time it was expected that
every user would switch to git anyway.
At that time also the gkeys project was very active, and git was
(besides webrsync) the only expected way to get checksums for the
full tree. In particular, rsync was inherently insecure.

The situation has changed meanwhile on both sides: gkeys was
apparently practically abandoned, and instead gemato was introduced
and is actively supported. That suddenly the gentoo-mirror repository
is more secure than the git repository is also a side effect of
gemato, because only for this the infra keys are now suddenly
distributed in a package.

> If you're using squashfs git pull probably isn't the right solution for you.

Exactly. That's why I completely disagree with portage's regression
of replacing the previously working solution by the only partially
working "git pull".

>> 4. Even if the user made the mistake to edit a file, portage should
>>not just die on syncing.
>
> emerge --sync won't die in a situation like in general.

It does: git push refuses to start if there are uncommitted changes.

> but I don't think the correct default in this case should be
> to just wipe out the user's changes.

I do: Like for rsync a user should not do changes to the distributed
tree (unless he makes a PR) but in an overlay; otherwise he will
permanently have outdated files which are not correctly updated.
*If* a user wants such changes, he should correctly use git and commit.

But I am not against to make this an opt-in option for enabling it
by a developer (or advanced 

Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-07 Thread Rich Freeman
On Sat, Jul 7, 2018 at 5:29 PM Martin Vaeth  wrote:
>
> Rich Freeman  wrote:
> > On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth  wrote:
> >>
> >> Biggest issue is that git signature happens by the developer who
> >> last commited which means that in practice you need dozens/hundreds
> >> of keys.
> >
> > This is untrue. [...]
> > It will, of course, not work on the regular git repo [...]
> > You need to use a repo that is signed by infra
> > (which typically includes metadata/etc as well).
>
> I was speaking about gentoo's git repository, of course
> (the one which was attacked on github), not about a Frankensteined one
> with metadata history filling megabytes of disk space unnecessarily.
> Who has that much disk space to waste?

Doesn't portage create that metadata anyway when you run it, negating
any space savings at the cost of CPU to regenerate the cache?

>
> For the official git repository your assertions are simply false,
> as you apprently admit: It is currently not possible to use the
> official git repo (or the github clone of it which was attacked)
> in a secure manner.
>

Sure, but this also doesn't support signature verification at all (at
least not by portage - git can of course manually verify any commit),
so your points still don't apply.

> > and as a bonus they want them prepended to
> > instead of appended so that rsync resends the whole thing instead of
> > just the tail...
>
> Your implicit claim is untrue. rsync - as used by portage - always
> transfers whole files, only.

rsync is capable of transferring partial files.  I can't vouch for how
portage is using it, but both the rsync command line program and
librsync can do partial file transfers.  However, this is based on
offsets from the start of the file, so appending to a file will result
in the first part of the file being identical, but prepending will
break rsync's algorithm.

>
> > But, this was endlessly debated before the decision was made.
>
> The decision was about removing the ChangeLogs from the git
> repository. This was certainly the correct decision, because -
> as you said - the ChangeLogs *can* be regenerated from the
> git history and thus it makes no sense to modify/store them
> redundantly.

There were two decisions:

https://projects.gentoo.org/council/meeting-logs/20141014-summary.txt

"do we need to continue to create new ChangeLog entries once we're
operating in git?"  No.

https://projects.gentoo.org/council/meeting-logs/20160410-summary.txt

"The council does not require that ChangeLogs be generated or
  distributed through the rsync system. It is at the discretion of our
  infrastructure team whether or not this service continues."
  Accepted (4 yes, 1 no, 2 abstention)

> > It probably should be a configurable option in repos.conf, but
> > honestly, forced pushes are not something that should be considered a
> > good practice.
>
> 1. portage shouldn't decide about practices of overlays.

Hence the reason I suggested it should be a repos.conf option.

> 2. also in the official gentoo repository force pushes happen
>occassionally. Last occurrence was e.g. when undoing the
>malevolent forced push ;)

Sure, but that was a fast-forward from the last good commit, so it
wouldn't require a force pull unless a user had done a force pull on
the bad repo.

> 3. git pull fails not only for forced pushes but also in several
>other occassions; for instance, if your filesystem changed inodes
>numbers (e.g. squash + overlayfs after a resquash+remount).

If you're using squashfs git pull probably isn't the right solution for you.

> 4. Even if the user made the mistake to edit a file, portage should
>not just die on syncing.

emerge --sync won't die in a situation like in general.  Maybe it will
if there is a merge conflict, but I don't think the correct default in
this case should be to just wipe out the user's changes.  I'm all for
making that an option, however.

-- 
Rich



[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-07 Thread Martin Vaeth
Rich Freeman  wrote:
> On Sat, Jul 7, 2018 at 1:51 AM Martin Vaeth  wrote:
>> Davyd McColl  wrote:
>>
>> > I ask because prior to the GitHub incident, I didn't have signature
>> > verification enabled
>>
>> Currently, it is not practical to change this, see my other posting.
>
> You clearly don't understand what it actually checks.

Davyd and I were obviously speaking about the gentoo repository
(the official one and the one on github which got hacked).
For these repositories verification is practically not possible.
(That there are also *other* repositories - with huge metadata history -
which might be easier to verify is a different story).

Perversely, the official comments after the hack had
suggested that you should have enabled signature verification for
the hacked repository which was simply practically not possible.




[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-07 Thread Martin Vaeth
Rich Freeman  wrote:
> On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth  wrote:
>>
>> Biggest issue is that git signature happens by the developer who
>> last commited which means that in practice you need dozens/hundreds
>> of keys.
>
> This is untrue. [...]
> It will, of course, not work on the regular git repo [...]
> You need to use a repo that is signed by infra
> (which typically includes metadata/etc as well).

I was speaking about gentoo's git repository, of course
(the one which was attacked on github), not about a Frankensteined one
with metadata history filling megabytes of disk space unnecessarily.
Who has that much disk space to waste?

For the official git repository your assertions are simply false,
as you apprently admit: It is currently not possible to use the
official git repo (or the github clone of it which was attacked)
in a secure manner.

>> > unless you stick --force in your pull
>>
>> Unfortunately, it is not that simple: git pull --force only works if
> [...]
> You completely trimmed the context around my quote. [...]
> they simply would not be pulled without --force.

I was saying that they would not be pulled *with* --force either,
because pull --force is not as strong as you think it is (it would
have shown you conflicts to resolve manually).
You would have to use the commands that I have posted.

> You seem to be providing advice for how to do a pull with a shallow
> repository

No, what I said is not related to a shallow repository. It has to do
with pulling a forced push, in general.

>> At least since the ChangeLogs have been removed.
>> IMHO it was the wrong decision to not keep them in the rsync tree
>> (The tool to regenerate them from git was/is available).
>
> Changelogs are redundant with git, and they take a ton of space (which
> of late everybody seems to be super-concerned about)

Compared to the git history, they take very little space.
If you squash the portage tree, it is hardly measurable.
And with the ChangeLogs, rsync would still be a sane option for
most users. Without ChangeLogs many users are unnecessarily forced
to change and to sacrifice the space for git history.

> and as a bonus they want them prepended to
> instead of appended so that rsync resends the whole thing instead of
> just the tail...

Your implicit claim is untrue. rsync - as used by portage - always
transfers whole files, only.

> But, this was endlessly debated before the decision was made.

The decision was about removing the ChangeLogs from the git
repository. This was certainly the correct decision, because -
as you said - the ChangeLogs *can* be regenerated from the
git history and thus it makes no sense to modify/store them
redundantly.

But I was speaking about the distribution of ChangeLogs in rsync:
Whenever the infrastructure uses egencache to generate the metadata,
it could simply pass --update-changelogs so that rsync users
still would have ChangeLogs: They cannot get them from git history.

> My
> point is that the sorts of people who like Gentoo would probably tend
> to like git.

"Liking" git does not mean that one has to use it also for things
for which it brings nothing. And for most users none of its features
is useful for the portage tree. With one exception: ChangeLogs.
That's why I am adverising to bring them back to the rsync tree.

> The "keys problem" has nothing to do with the security of git
> verification, because those keys are not used by git verification on
> the end-user side.

Whoever is that git/developer affine that he prefers git despite
it costs more disk space will certainly want to use the actual
source repository and not a worse rsync-clone repository.

> It probably should be a configurable option in repos.conf, but
> honestly, forced pushes are not something that should be considered a
> good practice.

1. portage shouldn't decide about practices of overlays.
2. also in the official gentoo repository force pushes happen
   occassionally. Last occurrence was e.g. when undoing the
   malevolent forced push ;)
3. git pull fails not only for forced pushes but also in several
   other occassions; for instance, if your filesystem changed inodes
   numbers (e.g. squash + overlayfs after a resquash+remount).
4. Even if the user made the mistake to edit a file, portage should
   not just die on syncing.




Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-07 Thread Rich Freeman
On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth  wrote:
>
> Rich Freeman  wrote:
> >
> > Biggest issue with git signature verification is that right now it
> > will still do a full pull/checkout before verifying
>
> Biggest issue is that git signature happens by the developer who
> last commited which means that in practice you need dozens/hundreds
> of keys.

This is untrue.  The last git signature is made by infra or the
CI-bot, and this is the signature that portage checks.

Portage will NOT accept a developer key, or any other key in your
keychain, as being valid.

It will, of course, not work on the regular git repo used for
committing for this reason.  You need to use a repo that is signed by
infra (which typically includes metadata/etc as well).

I'll trim most of the rest of your email and only reply to significant
bits, because you seem to not understand the point above which
invalidates almost everything you wrote.  The concerns you raise would
be an issue if you were checking individual developer keys.

>
> So currently, it is impossible to do *any* automatic tree verification,
> unless you manually fetch/update all of the developer keys.
>

As noted, you don't need to fetch any developer keys, and if you do
fetch them, portage will ignore them.

>
> > unless you stick --force in your pull
>
> Unfortunately, it is not that simple: git pull --force only works if
> the checked out tree is old enough (in which case git pull without --force
> would have worked also, BTW).

You completely trimmed the context around my quote.  I was talking
about the malicious commits in the recent attack.  They were
force-pushed, so it doesn't matter how complete your repository is -
they simply would not be pulled without --force.

You seem to be providing advice for how to do a pull with a shallow
repository, which I'm not talking about.

> > Honestly, I think git is a good fit for a lot of Gentoo users.
>
> At least since the ChangeLogs have been removed.
> IMHO it was the wrong decision to not keep them in the rsync tree
> (The tool to regenerate them from git was/is available).

Changelogs are redundant with git, and they take a ton of space (which
of late everybody seems to be super-concerned about).  I don't get
that on one hand people get twitchy about /usr/portage taking more
than 1GB, and on the other hand they want a bazillion text files
dumped all over the place, and as a bonus they want them prepended to
instead of appended so that rsync resends the whole thing instead of
just the tail...

But, this was endlessly debated before the decision was made.  Trust
me, I read every post before voting to have them removed.

>
> > it is different, but all the history/etc is the sort of thing I think
> > would appeal to many here.
>
> Having the ChangeLogs would certainly be sufficient for the majority
> of users. It is very rare that a user really needs to access the
> older version of the file, and in that case it is simple enough
> to fetch it manually from e.g. github.

It is very rare that somebody would want to use Gentoo at all.  My
point is that the sorts of people who like Gentoo would probably tend
to like git.  But, to each their own...

>
> > Security is obviously getting a renewed focus across the board
>
> Unfortunately, due to the mentioned keys problem, git is
> currently the *unsafest* method for syncing.

The "keys problem" has nothing to do with the security of git
verification, because those keys are not used by git verification on
the end-user side.  An infra-controlled key is used for verification
whether you sync with git or rsync.  Either way you're relying on
infra checking the developer keys at time of commit.

Now, as I already mentioned git syncing is currently less safe due to
it doing the checkout before the verification, and they are in the
process of fixing this.

> (BTW, due to the number of committers the portage tree has a quite
> strict policy w.r.t. forced pushes. Overlays, especially of single
> users, might have different policies and thus can fail quite often
> due to the "git pull" bug.)

It probably should be a configurable option in repos.conf, but
honestly, forced pushes are not something that should be considered a
good practice.  There are times that it is the best option, but those
are rare, IMO.

-- 
Rich



Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-07 Thread Rich Freeman
On Sat, Jul 7, 2018 at 1:51 AM Martin Vaeth  wrote:
>
> Davyd McColl  wrote:
>
> > I ask because prior to the GitHub incident, I didn't have signature
> > verification enabled
>
> Currently, it is not practical to change this, see my other posting.
>

You clearly don't understand what it actually checks.  It is
completely practical to enable this today (though not as secure as it
could be).  I'll elaborate in a reply to the other email.

-- 
Rich



[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Davyd McColl  wrote:
> @Rich: if I understand the process correctly, the same commits are
> pushed to infra and GitHub by the CI bot?

Yes, the repositories are always identical (up to a few seconds delay).

> I ask because prior to the GitHub incident, I didn't have signature
> verification enabled

Currently, it is not practical to change this, see my other posting.

> then I should (in theory) be able to change my repo.conf
> settings, fiddle the remote in /usr/portage, and switch seamlessly from
> gentoo to GitHub?

If by "fiddle the remote in /usr/portage" you mean to edit
the .git/config file you are right.
Note that just changing the remote in repos.conf has only any
effect if you completely removed /usr/portage, and portage has
to clone anew.




[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Rich Freeman  wrote:
>
> Biggest issue with git signature verification is that right now it
> will still do a full pull/checkout before verifying

Biggest issue is that git signature happens by the developer who
last commited which means that in practice you need dozens/hundreds
of keys. No package is available for this, and the only tool which
I know which was originally developed to manage these (app-crypt/gkeys)
is not ready for usage for verifaction (gkeys-gpg --verify was
apparently never run by its developer since its python code breaks
already for argument parsing), and its developmant has stalled.

Moreover, although I have written a dirty substitute for gkeys-gpg, it
is not clear how to use gkeys to update signatures and remove staled
ones: It appears that for each usage you have to fetch all seeds and
keys anew. (And I am not even sure whether the seeds it fetches are
really still maintained).

So currently, it is impossible to do *any* automatic tree verification,
unless you manually fetch/update all of the developer keys.

Safest bet if you are a git user is to verify manually whether the
"Verify" field of the latest commit in github really belongs to a
gentoo devloper and is not a fake account. (Though that may be hard
to decide.)

> until the patch makes its way into release (the patch will do a fetch
> and verify before it does a checkout

This helps nothing to get all the correct keys (and no fake keys!)
you need to verify the signature.

> unless you stick --force in your pull

Unfortunately, it is not that simple: git pull --force only works if
the checked out tree is old enough (in which case git pull without --force
would have worked also, BTW).
The correct thing to do if git pull failed is:

git update-index --refresh -q --unmerged # -q is important here!
git fetch
git reset --hard $(git rev-parse --abbrev-ref \
  --symbolic-full-name @{upstream})

(The first command is needed to get rid of problems caused by filesystems
like overlayfs).

(If you are a developer and do not want to risk that syncing overrides
your uncommited changes, you might want to replace --hard by --merge).

> not a great idea for scripts and portage doesn't do this).

I think it is a very great idea. In fact, portage did do this previously
*always* (with --merge instead of --hard) and the only reason this was
removed is that the
  git update-index --refresh -q --unmerge
takes quite some time which is not necessary for people who do not
use a special filesystem like overlayfs for the portage tree.
The right thing to do IMHO is that portage would use this anyway as
a fallback if "git pull" fails. I usually patch portage to do this.

> that was just dumb luck

Exactly. That's why using "git pull" should not be considered as
a security measurement. It is only a safety measurement if you are
a developer and want to avoid loosing local changes at any price
if you mistakenly sync before committing (although the mentioned
--merge instead of --hard should be safe here, too).

> Honestly, I think git is a good fit for a lot of Gentoo users.

At least since the ChangeLogs have been removed.
IMHO it was the wrong decision to not keep them in the rsync tree
(The tool to regenerate them from git was/is available).

> it is different, but all the history/etc is the sort of thing I think
> would appeal to many here.

Having the ChangeLogs would certainly be sufficient for the majority
of users. It is very rare that a user really needs to access the
older version of the file, and in that case it is simple enough
to fetch it manually from e.g. github.

> Also, git is something that is becoming increasingly unavoidable

If you learn something about git from using it through portage,
this only indicates a bug in portage. (Like e.g. using "git pull" is).

> Security is obviously getting a renewed focus across the board

Unfortunately, due to the mentioned keys problem, git is
currently the *unsafest* method for syncing. The "git pull" bug
of portage is not appealing for normal usage, either.
(BTW, due to the number of committers the portage tree has a quite
strict policy w.r.t. forced pushes. Overlays, especially of single
users, might have different policies and thus can fail quite often
due to the "git pull" bug.)