[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Davyd McColl  wrote:
> @Rich: if I understand the process correctly, the same commits are
> pushed to infra and GitHub by the CI bot?

Yes, the repositories are always identical (up to a few seconds delay).

> I ask because prior to the GitHub incident, I didn't have signature
> verification enabled

Currently, it is not practical to change this, see my other posting.

> then I should (in theory) be able to change my repo.conf
> settings, fiddle the remote in /usr/portage, and switch seamlessly from
> gentoo to GitHub?

If by "fiddle the remote in /usr/portage" you mean to edit
the .git/config file you are right.
Note that just changing the remote in repos.conf has only any
effect if you completely removed /usr/portage, and portage has
to clone anew.




[gentoo-user] Re: Re[2]: Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Rich Freeman  wrote:
>
> git has the advantage that it can just read the current HEAD and from
> that know exactly what commits are missing, so there is way less
> effort spent figuring out what changed.

I don't know the exact protocol, but I would assume that git is
even more efficient: I would assume

1. git transfers only changes between similar files
(in contrast: rsync could only do this if the filename has not
changed, and even that is switched off for portage syncing).

2. git transfers compressed data.

(Both are assumptions which perhaps some git guru might confirm.)




[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Rich Freeman  wrote:
>
> Biggest issue with git signature verification is that right now it
> will still do a full pull/checkout before verifying

Biggest issue is that git signature happens by the developer who
last commited which means that in practice you need dozens/hundreds
of keys. No package is available for this, and the only tool which
I know which was originally developed to manage these (app-crypt/gkeys)
is not ready for usage for verifaction (gkeys-gpg --verify was
apparently never run by its developer since its python code breaks
already for argument parsing), and its developmant has stalled.

Moreover, although I have written a dirty substitute for gkeys-gpg, it
is not clear how to use gkeys to update signatures and remove staled
ones: It appears that for each usage you have to fetch all seeds and
keys anew. (And I am not even sure whether the seeds it fetches are
really still maintained).

So currently, it is impossible to do *any* automatic tree verification,
unless you manually fetch/update all of the developer keys.

Safest bet if you are a git user is to verify manually whether the
"Verify" field of the latest commit in github really belongs to a
gentoo devloper and is not a fake account. (Though that may be hard
to decide.)

> until the patch makes its way into release (the patch will do a fetch
> and verify before it does a checkout

This helps nothing to get all the correct keys (and no fake keys!)
you need to verify the signature.

> unless you stick --force in your pull

Unfortunately, it is not that simple: git pull --force only works if
the checked out tree is old enough (in which case git pull without --force
would have worked also, BTW).
The correct thing to do if git pull failed is:

git update-index --refresh -q --unmerged # -q is important here!
git fetch
git reset --hard $(git rev-parse --abbrev-ref \
  --symbolic-full-name @{upstream})

(The first command is needed to get rid of problems caused by filesystems
like overlayfs).

(If you are a developer and do not want to risk that syncing overrides
your uncommited changes, you might want to replace --hard by --merge).

> not a great idea for scripts and portage doesn't do this).

I think it is a very great idea. In fact, portage did do this previously
*always* (with --merge instead of --hard) and the only reason this was
removed is that the
  git update-index --refresh -q --unmerge
takes quite some time which is not necessary for people who do not
use a special filesystem like overlayfs for the portage tree.
The right thing to do IMHO is that portage would use this anyway as
a fallback if "git pull" fails. I usually patch portage to do this.

> that was just dumb luck

Exactly. That's why using "git pull" should not be considered as
a security measurement. It is only a safety measurement if you are
a developer and want to avoid loosing local changes at any price
if you mistakenly sync before committing (although the mentioned
--merge instead of --hard should be safe here, too).

> Honestly, I think git is a good fit for a lot of Gentoo users.

At least since the ChangeLogs have been removed.
IMHO it was the wrong decision to not keep them in the rsync tree
(The tool to regenerate them from git was/is available).

> it is different, but all the history/etc is the sort of thing I think
> would appeal to many here.

Having the ChangeLogs would certainly be sufficient for the majority
of users. It is very rare that a user really needs to access the
older version of the file, and in that case it is simple enough
to fetch it manually from e.g. github.

> Also, git is something that is becoming increasingly unavoidable

If you learn something about git from using it through portage,
this only indicates a bug in portage. (Like e.g. using "git pull" is).

> Security is obviously getting a renewed focus across the board

Unfortunately, due to the mentioned keys problem, git is
currently the *unsafest* method for syncing. The "git pull" bug
of portage is not appealing for normal usage, either.
(BTW, due to the number of committers the portage tree has a quite
strict policy w.r.t. forced pushes. Overlays, especially of single
users, might have different policies and thus can fail quite often
due to the "git pull" bug.)




Re: [gentoo-user] syncing via via git and signature failure

2018-07-06 Thread Bill Kenworthy
On 07/07/18 09:42, Floyd Anderson wrote:
> Hi Bill,
>
> On Sat, 07 Jul 2018 07:40:00 +0800
> Bill Kenworthy  wrote:
>>
>> I still have this error and  Ive tried a number of things including:
>>
>> gemato create -p ebuild -K /usr/share/openpgp-keys/gentoo-release.asc
>> /usr/portage/
>>
>> next emerge --sync error-ed on a lot of private manifest files but
>> missing toot manifest error disappeared.  Deleted them and successfully
>> resynced.
>>
>> olympus /usr/portage # gemato verify -s -K
>> /usr/share/openpgp-keys/gentoo-release.asc /usr/portage/
>> INFO:root:Refreshing keys from keyserver...
>> INFO:root:Keys refreshed.
>> ERROR:root:Top-level Manifest /usr/portage/Manifest is not OpenPGP
>> signed
>> olympus /usr/portage #
>>
>> also did a "git reset --hard"
>>
>> still get:
>>
>> olympus /usr/portage # emerge --sync
> Syncing repository 'gentoo' into '/usr/portage'...
>> /usr/bin/git pull
>> Already up to date.
>>  * Using keys from /usr/share/openpgp-keys/gentoo-release.asc
>>  * Refreshing keys from keyserver
>> ...  
>>   
>>
>> [ ok ]
>>  * No valid signature found: unable to verify signature (missing key?)
>> q: Updating ebuild cache in /usr/portage ...
>
> please be aware of the context of my response to Mick. He use *rsync*
> and so do I. It seems you are using Git and thus, a different tree
> verification mechanism. I don't know why you have gemato installed,
> because it comes usually only with sys-apps/portage[rsync-verify] set
> and is only related to *rsync* therefore.
>
> Have a look at:
>
>  - [1] 
>  - [2]
> 
>  - [3] 
>
> for some further information. Maybe:
>
>  $ git status --untracked-files
>
> within your tree location can help to identify and sanitise the tree
> from any of your (with gemato) created files.
>
>
Brings up all the manifest files so I'll clean them out, resync and
see.  I do have rsync-verify set but I would not have thought that the
problem.  The system was converted to git syncing (by deletion and
recreating) soon after git became available so it could be something
ancient is the cause.  None of the docs I have examined seem to cover
portage and git problems very well.


BillK





Re: [gentoo-user] syncing via via git and signature failure

2018-07-06 Thread Floyd Anderson

Hi Bill,

On Sat, 07 Jul 2018 07:40:00 +0800
Bill Kenworthy  wrote:


I still have this error and  Ive tried a number of things including:

gemato create -p ebuild -K /usr/share/openpgp-keys/gentoo-release.asc
/usr/portage/

next emerge --sync error-ed on a lot of private manifest files but
missing toot manifest error disappeared.  Deleted them and successfully
resynced.

olympus /usr/portage # gemato verify -s -K
/usr/share/openpgp-keys/gentoo-release.asc /usr/portage/
INFO:root:Refreshing keys from keyserver...
INFO:root:Keys refreshed.
ERROR:root:Top-level Manifest /usr/portage/Manifest is not OpenPGP signed
olympus /usr/portage #

also did a "git reset --hard"

still get:

olympus /usr/portage # emerge --sync

Syncing repository 'gentoo' into '/usr/portage'...

/usr/bin/git pull
Already up to date.
 * Using keys from /usr/share/openpgp-keys/gentoo-release.asc
 * Refreshing keys from keyserver
... 
   
[ ok ]
 * No valid signature found: unable to verify signature (missing key?)
q: Updating ebuild cache in /usr/portage ...


please be aware of the context of my response to Mick. He use *rsync* 
and so do I. It seems you are using Git and thus, a different tree 
verification mechanism. I don't know why you have gemato installed, 
because it comes usually only with sys-apps/portage[rsync-verify] set 
and is only related to *rsync* therefore.


Have a look at:

 - [1] 
 - [2] 

 - [3] 

for some further information. Maybe:

 $ git status --untracked-files

within your tree location can help to identify and sanitise the tree 
from any of your (with gemato) created files.



--
Regards,
floyd




Re: [gentoo-user] All Gentoo signing key expired and no way to fix it

2018-07-06 Thread Tom H
On Wed, Jul 4, 2018 at 5:43 PM gevisz  wrote:
>
> but it "shot" only after sourcing /etc/profile.

Which is what "su -l" does.



Re: [gentoo-user] All Gentoo signing key expired and no way to fix it

2018-07-06 Thread Tom H
On Wed, Jul 4, 2018 at 5:39 PM gevisz  wrote:
> 2018-07-03 16:22 GMT+03:00 Mart Raudsepp :


>> If you use su, you should be using "su -" (or "su -l" or "su --login"),
>> not "su".
>
> I have used only "su" for already 3 years, since switched to Gentoo
> from Ubuntu and never had any problems with it.
>
> Could you explain a little bit more why "su -" should be used instead.
>
> From the man page I've got the following:
>
> -, -l, --login
> Provide an environment similar to what the user would expect had
> the user logged in directly.
>
> But I cannot see why I need the original root environment,
> especially if I never set it up.

It's more to protect from user envvars leaking into root's
environment. That's why "service(8)" resets the environment (and then
sets some, like PATH) on Linux and {Free,Net}BSD.

I've seen a daemon log in german because a colleague simply used "su"
to restart it (without using "service").


>> If you use sudo, you might need to pass -i (--login) option to it.
>
> I hate using sudo since I have been forced to use it in Ubuntu.

Ubuntu defaults to "sudo" but doesn't force you to use it! If you
prefer "su", set a root password.



Re: [gentoo-user] syncing via via git and signature failure

2018-07-06 Thread Bill Kenworthy
On 06/07/18 00:06, Floyd Anderson wrote:
> On Wed, 04 Jul 2018 22:57:05 -0400
> John Covici  wrote:
>>
>> I got the following when running your command:
>> gemato verify -K /tmp/gentoo-release.asc.20180703 /usr/portage/
>> INFO:root:Refreshing keys from keyserver...
>> INFO:root:Keys refreshed.
>
> To be more specific, I wasn't interested in verifying the tree. My
> main goal was to get:
>
>  INFO:root:Keys refreshed.
>
> because my sync/update script hung at:
>
>  INFO:root:Refreshing keys from keyserver...
>
> all the time, caused by:
>
>  gpg: Can't check signature: No public key
>
> result, so I wasn't able to update.
>
>> ERROR:root:Top-level Manifest not found in /usr/portage/
>>
>> How can I fix, or do I need to fix?
>
> I've no idea why your portage tree doesn't have a top-level Manifest
> file (assuming "/usr/portage" is the location of your tree), but it
> should be created/updated on next syncing.
>
>

I still have this error and  Ive tried a number of things including:

gemato create -p ebuild -K /usr/share/openpgp-keys/gentoo-release.asc
/usr/portage/

next emerge --sync error-ed on a lot of private manifest files but
missing toot manifest error disappeared.  Deleted them and successfully
resynced.

olympus /usr/portage # gemato verify -s -K
/usr/share/openpgp-keys/gentoo-release.asc /usr/portage/
INFO:root:Refreshing keys from keyserver...
INFO:root:Keys refreshed.
ERROR:root:Top-level Manifest /usr/portage/Manifest is not OpenPGP signed
olympus /usr/portage #

also did a "git reset --hard"

still get:

olympus /usr/portage # emerge --sync
>>> Syncing repository 'gentoo' into '/usr/portage'...
/usr/bin/git pull
Already up to date.
 * Using keys from /usr/share/openpgp-keys/gentoo-release.asc
 * Refreshing keys from keyserver
... 
   
[ ok ]
 * No valid signature found: unable to verify signature (missing key?)
q: Updating ebuild cache in /usr/portage ...


BillK





Re: [gentoo-user] Re: How long does "Verifying /usr/portage" take?

2018-07-06 Thread Dale
Rich Freeman wrote:
> On Fri, Jul 6, 2018 at 4:43 PM Grant Edwards  
> wrote:
>> On 2018-07-06, Grant Edwards  wrote:
>>
>>> Now that the public key stuff is working again (knock on wood), I'm
>>> curious if it's usual for an emerge --sync to take 10-15 minutes
>>> longer than it used due to the "Verifying /usr/portage" step.
>>>
>>> On some systems (with fewer packages installed) it only takes a minute
>>> or less. But, on my "main" desktop system it takes 10-15 minutes every
>>> time.
>> I cleared out /usr/portage/distfiles, and the verify time dropped to
>> about 10 seconds.  I should probably do that more often...
>>
> Assuming it is reproducible it is probably a bug.
>
> That said, I always move distfiles to someplace like /var/cache.  I
> guess /usr/portage should probably be in there as well, though I would
> not mix my distfiles with my repository for a number of reasons.  I
> think it is just inertia preserving the current situation as I can't
> imagine anybody involved in portage/council/etc really would design it
> this way today.
>
> You can tweak this in make.conf with DISTDIR=...
>


I set mine up this way and it seems to work OK.  It's sort of along the
lines of yours. 

root@fireball / # ls /var/cache/portage/ -al
total 160
drwxr-xr-x   5 root    root  4096 Dec 20  2012 .
drwxr-xr-x  13 root    root  4096 Jul  4 03:26 ..
drwxrwxr-x   3 portage portage 143360 Jul  2 20:42 distfiles
drwxr-xr-x 103 portage portage   4096 Jul  3 00:01 packages
drwxr-xr-x 171 portage portage   4096 Jul  2 18:22 tree
root@fireball / # 


If anyone wants to duplicate this, this is the relevant parts of make.conf:


DISTDIR="/var/cache/portage/distfiles/"
PKGDIR="/var/cache/portage/packages"
PORTDIR="/var/cache/portage/tree" 


After I did the move, I think someone came up with a better place but to
be blunt, I just didn't feel like moving it all again.  It's not like
portage really cares where it is anyway. 

Dale

:-)  :-) 



Re: [gentoo-user] Re: How long does "Verifying /usr/portage" take?

2018-07-06 Thread Rich Freeman
On Fri, Jul 6, 2018 at 4:43 PM Grant Edwards  wrote:
>
> On 2018-07-06, Grant Edwards  wrote:
>
> > Now that the public key stuff is working again (knock on wood), I'm
> > curious if it's usual for an emerge --sync to take 10-15 minutes
> > longer than it used due to the "Verifying /usr/portage" step.
> >
> > On some systems (with fewer packages installed) it only takes a minute
> > or less. But, on my "main" desktop system it takes 10-15 minutes every
> > time.
>
> I cleared out /usr/portage/distfiles, and the verify time dropped to
> about 10 seconds.  I should probably do that more often...
>

Assuming it is reproducible it is probably a bug.

That said, I always move distfiles to someplace like /var/cache.  I
guess /usr/portage should probably be in there as well, though I would
not mix my distfiles with my repository for a number of reasons.  I
think it is just inertia preserving the current situation as I can't
imagine anybody involved in portage/council/etc really would design it
this way today.

You can tweak this in make.conf with DISTDIR=...

-- 
Rich



Re: [gentoo-user] Re: How long does "Verifying /usr/portage" take?

2018-07-06 Thread Mick
On Friday, 6 July 2018 21:43:35 BST Grant Edwards wrote:
> On 2018-07-06, Grant Edwards  wrote:
> > Now that the public key stuff is working again (knock on wood), I'm
> > curious if it's usual for an emerge --sync to take 10-15 minutes
> > longer than it used due to the "Verifying /usr/portage" step.
> > 
> > On some systems (with fewer packages installed) it only takes a minute
> > or less. But, on my "main" desktop system it takes 10-15 minutes every
> > time.
> 
> I cleared out /usr/portage/distfiles, and the verify time dropped to
> about 10 seconds.  I should probably do that more often...

This is odd.  Why would a verification of portage include the distfiles, when 
the latter are checked before they are unpacked as a package is being emerged.  
It doesn't make sense to me.  :-/

-- 
Regards,
Mick

signature.asc
Description: This is a digitally signed message part.


[gentoo-user] Re: How long does "Verifying /usr/portage" take?

2018-07-06 Thread Grant Edwards
On 2018-07-06, Grant Edwards  wrote:

> Now that the public key stuff is working again (knock on wood), I'm
> curious if it's usual for an emerge --sync to take 10-15 minutes
> longer than it used due to the "Verifying /usr/portage" step.
>
> On some systems (with fewer packages installed) it only takes a minute
> or less. But, on my "main" desktop system it takes 10-15 minutes every
> time.

I cleared out /usr/portage/distfiles, and the verify time dropped to
about 10 seconds.  I should probably do that more often...

-- 
Grant Edwards   grant.b.edwardsYow! Hello, GORRY-O!!
  at   I'm a GENIUS from HARVARD!!
  gmail.com




Re: [gentoo-user] How long does "Verifying /usr/portage" take?

2018-07-06 Thread Rich Freeman
On Fri, Jul 6, 2018 at 3:02 PM Grant Edwards  wrote:
>
> Now that the public key stuff is working again (knock on wood), I'm
> curious if it's usual for an emerge --sync to take 10-15 minutes
> longer than it used due to the "Verifying /usr/portage" step.
>

Again, the sync mechanisms are different, but I note that the git
verify is nearly instant.

Of course, the only thing being fed to gpg in the git case is the git
commit record itself, which is about 10 lines of text.  That record
contains a content hash of the tree record, which in turn references
content hashes of every directory inside, and so on.  So, with git
most of the hash validation is happening constantly just by virtue of
everything being content-hashed, and the only extra layer with the gpg
signature is to sign the top level of the whole tree.

Now, on the flip side, some of those git operations might take time
since it is stating files for things like git status/etc.  I've never
noticed an issue, but I'm also on an SSD and tend to always have a
warm cache when I'm using it.  emerge shouldn't need to trigger any
git operations except when syncing.

--
Rich



--
Rich



Re: [gentoo-user] How long does "Verifying /usr/portage" take?

2018-07-06 Thread Dale
R0b0t1 wrote:
> On Fri, Jul 6, 2018 at 2:16 PM, Dale  wrote:
>> Grant Edwards wrote:
>>> Now that the public key stuff is working again (knock on wood), I'm
>>> curious if it's usual for an emerge --sync to take 10-15 minutes
>>> longer than it used due to the "Verifying /usr/portage" step.
>>>
>>> On some systems (with fewer packages installed) it only takes a minute
>>> or less. But, on my "main" desktop system it takes 10-15 minutes every
>>> time.  During the verify step, the emerge process is only using about
>>> 5% of the CPU, and my system is running 80% or more idle.
>>>
>>
>> I haven't timed mine yet but that sounds about like mine here.  I'm not
>> sure what the bottleneck is but I have a four core AMD CPU running at
>> 3.2GHz with 16GBs of ram and SATA spinning rust drives.  While I'm glad
>> to have the added security measures, it does add a significant amount of
>> time to the update process, the tree not the compile part.  We all know
>> the compile part can get big.  lol
>>
>> I guess like everything else, we'll just have to get used to it.  People
>> will hack a ham sandwich if they can and can get something from it.
>> That would be mustard on mine.  Some may like Mayo, which is fine too.
>> ;-)
>>
> Run a program with `strace -c` to get statistics on time spent in
> system calls. It will be disk IO.
>
>


I was thinking my drive light was on a lot during that time but wasn't
sure.  I used to go to the kitchen and get something to drink and a
snack and it be ready when I came back.  I guess now I can cook a light
meal and come back and it be ready.  Maybe I will lose a few pounds
because of this, looking for something positive in this besides the
obvious security improvements.  :? 

Either way, it takes longer but given the status of hackers, we really
need this.  It seems github sort of shined a light on what can happen
even if it is noticed pretty quick. 

Dale

:-)  :-) 



Re: [gentoo-user] How long does "Verifying /usr/portage" take?

2018-07-06 Thread R0b0t1
On Fri, Jul 6, 2018 at 2:16 PM, Dale  wrote:
> Grant Edwards wrote:
>> Now that the public key stuff is working again (knock on wood), I'm
>> curious if it's usual for an emerge --sync to take 10-15 minutes
>> longer than it used due to the "Verifying /usr/portage" step.
>>
>> On some systems (with fewer packages installed) it only takes a minute
>> or less. But, on my "main" desktop system it takes 10-15 minutes every
>> time.  During the verify step, the emerge process is only using about
>> 5% of the CPU, and my system is running 80% or more idle.
>>
>
>
> I haven't timed mine yet but that sounds about like mine here.  I'm not
> sure what the bottleneck is but I have a four core AMD CPU running at
> 3.2GHz with 16GBs of ram and SATA spinning rust drives.  While I'm glad
> to have the added security measures, it does add a significant amount of
> time to the update process, the tree not the compile part.  We all know
> the compile part can get big.  lol
>
> I guess like everything else, we'll just have to get used to it.  People
> will hack a ham sandwich if they can and can get something from it.
> That would be mustard on mine.  Some may like Mayo, which is fine too.
> ;-)
>

Run a program with `strace -c` to get statistics on time spent in
system calls. It will be disk IO.



[gentoo-user] Re: How long does "Verifying /usr/portage" take?

2018-07-06 Thread Grant Edwards
On 2018-07-06, Dale  wrote:

> I haven't timed mine yet but that sounds about like mine here.  I'm not
> sure what the bottleneck is but I have a four core AMD CPU running at
> 3.2GHz with 16GBs of ram and SATA spinning rust drives.  While I'm glad
> to have the added security measures, it does add a significant amount of
> time to the update process, the tree not the compile part.  We all know
> the compile part can get big.  lol 

Yea, it sounds a bit stupid to whine about an extra 15 minutes doing a
"sync" now that the build time for chromium is measured in days on a
not-that-old machine. :)

-- 
Grant Edwards   grant.b.edwardsYow! One FISHWICH coming
  at   up!!
  gmail.com




Re: [gentoo-user] How long does "Verifying /usr/portage" take?

2018-07-06 Thread Dale
Grant Edwards wrote:
> Now that the public key stuff is working again (knock on wood), I'm
> curious if it's usual for an emerge --sync to take 10-15 minutes
> longer than it used due to the "Verifying /usr/portage" step.
>
> On some systems (with fewer packages installed) it only takes a minute
> or less. But, on my "main" desktop system it takes 10-15 minutes every
> time.  During the verify step, the emerge process is only using about
> 5% of the CPU, and my system is running 80% or more idle.
>


I haven't timed mine yet but that sounds about like mine here.  I'm not
sure what the bottleneck is but I have a four core AMD CPU running at
3.2GHz with 16GBs of ram and SATA spinning rust drives.  While I'm glad
to have the added security measures, it does add a significant amount of
time to the update process, the tree not the compile part.  We all know
the compile part can get big.  lol 

I guess like everything else, we'll just have to get used to it.  People
will hack a ham sandwich if they can and can get something from it. 
That would be mustard on mine.  Some may like Mayo, which is fine too. 
;-) 

Dale

:-)  :-) 



[gentoo-user] How long does "Verifying /usr/portage" take?

2018-07-06 Thread Grant Edwards
Now that the public key stuff is working again (knock on wood), I'm
curious if it's usual for an emerge --sync to take 10-15 minutes
longer than it used due to the "Verifying /usr/portage" step.

On some systems (with fewer packages installed) it only takes a minute
or less. But, on my "main" desktop system it takes 10-15 minutes every
time.  During the verify step, the emerge process is only using about
5% of the CPU, and my system is running 80% or more idle.

-- 
Grant Edwards   grant.b.edwardsYow! TONY RANDALL!  Is YOUR
  at   life a PATIO of FUN??
  gmail.com




Re[6]: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Davyd McColl
@Rich thanks for taking the time to formulate that in-depth response. 
Appreciated.


-d

-- Original Message --
From: "Rich Freeman" 
To: gentoo-user@lists.gentoo.org
Sent: 2018-07-06 14:20:54
Subject: Re: Re[4]: [gentoo-user] Re: Portage, git and shallow cloning


On Fri, Jul 6, 2018 at 7:57 AM Davyd McColl  wrote:


@Rich: if I understand the process correctly, the same commits are
pushed to infra and GitHub by the CI bot?



I'm pretty sure the repos are identical (well, aside from whatever
order they're updated in).


I ask because prior to the GitHub incident, I didn't have signature
verification enabled (I hadn't read about it and it didn't even occur 
to

me). So my plan was to (whilst GitHub was being sorted out) switch to
the gentoo git repo and enable verification and, once I'd seen that 
that
was working (because I'd also seen intermediate emails on this list 
from
people having issues getting signing keys working), perhaps switch 
back

to GitHub to put less strain on the Gentoo servers.


I never had issues with the signing keys, but git syncing works
differently from webrsync (which makes those threads a bit of a mess
as you have people offering advice to people using a different sync
method).  It is probably best to view them as completely different
implementations, though I'm sure they have elements in common.

Biggest issue with git signature verification is that right now it
will still do a full pull/checkout before verifying, which means that
if it fails you still have a bad /usr/portage (you get an error, but
that's it, and subsequent emerge commands will act on the bad repo).
For that reason alone it might be best to stick with infra's version
until the patch makes its way into release (the patch will do a fetch
and verify before it does a checkout, so while you might have bad git
commits in the history the actual contents of /usr/portage will be
known-good unless you go manually running git commands without doing
your own verification).

Now, in the recent attack a git sync would still have been safe
because the attacker was dumb and did a force push, which will make
git complain loudly if you try to pull (unless you stick --force in
your pull, which probably isn't a great idea for scripts and portage
doesn't do this).  But, that was just dumb luck because a smart
attacker would have rebased the nefarious commits so that they'd
seamlessly pull.  Really the attack was more of a defacement than
anything as they made a bunch of mistakes that showed they weren't
very serious, but any wakeup call is worth acting on.


So if the same commits are just pushed to two remotes (gentoo and
GitHub), then I should (in theory) be able to change my repo.conf
settings, fiddle the remote in /usr/portage, and switch seamlessly 
from
gentoo to GitHub? Alternatively, I could start with a clean 
/usr/portage

again, once I'm happy that I have signature verification working on my
machine.


As far as I can tell if you edit the repo URL in repos.conf and
probably also .git/config it should just seamlessly work, but I
haven't tried it.  Since it only accepts fast-forward pulls it
shouldn't do anything if the histories don't match.  If you do a sync
immediately before/after the change maybe you'll find that one repo is
behind the other and you just won't get any updates until the new repo
catches up, but I don't think portage will revert anything (that is an
advantage of git - it has a concept of directionality, though it looks
like portage is looking to add support to prevent replay attacks with
rsync as well).


I do sync frequently (I'm a bit of an update enthusiast) -- at least
once a week, though I prefer more often as I find that the longer I
leave between syncs and world-updates, the more effort I have to
overcome issues (few though they are). So git is a better fit for me, 
I

think.


Honestly, I think git is a good fit for a lot of Gentoo users.  Yes,
it is different, but all the history/etc is the sort of thing I think
would appeal to many here.  Also, git is something that is becoming
increasingly unavoidable, and mostly for reasons that have universal
appeal.  Once you grok it you'll be using it everywhere.

Security is obviously getting a renewed focus across the board, so I
think we'll see improvements no matter how you use Gentoo, ideally
using defaults (for whatever reason git sig checking isn't a default
today).  Besides improving verification on the end-user side there is
also a lot of interest in improving security on the developer side,
and with infra (hardware tokens, maybe E2E signature checking, etc).
As usual this involves a certain amount of debate (authentication
isn't actually all that easy of a problem).


--
Rich






Re: Re[4]: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Rich Freeman
On Fri, Jul 6, 2018 at 7:57 AM Davyd McColl  wrote:
>
> @Rich: if I understand the process correctly, the same commits are
> pushed to infra and GitHub by the CI bot?
>

I'm pretty sure the repos are identical (well, aside from whatever
order they're updated in).

> I ask because prior to the GitHub incident, I didn't have signature
> verification enabled (I hadn't read about it and it didn't even occur to
> me). So my plan was to (whilst GitHub was being sorted out) switch to
> the gentoo git repo and enable verification and, once I'd seen that that
> was working (because I'd also seen intermediate emails on this list from
> people having issues getting signing keys working), perhaps switch back
> to GitHub to put less strain on the Gentoo servers.

I never had issues with the signing keys, but git syncing works
differently from webrsync (which makes those threads a bit of a mess
as you have people offering advice to people using a different sync
method).  It is probably best to view them as completely different
implementations, though I'm sure they have elements in common.

Biggest issue with git signature verification is that right now it
will still do a full pull/checkout before verifying, which means that
if it fails you still have a bad /usr/portage (you get an error, but
that's it, and subsequent emerge commands will act on the bad repo).
For that reason alone it might be best to stick with infra's version
until the patch makes its way into release (the patch will do a fetch
and verify before it does a checkout, so while you might have bad git
commits in the history the actual contents of /usr/portage will be
known-good unless you go manually running git commands without doing
your own verification).

Now, in the recent attack a git sync would still have been safe
because the attacker was dumb and did a force push, which will make
git complain loudly if you try to pull (unless you stick --force in
your pull, which probably isn't a great idea for scripts and portage
doesn't do this).  But, that was just dumb luck because a smart
attacker would have rebased the nefarious commits so that they'd
seamlessly pull.  Really the attack was more of a defacement than
anything as they made a bunch of mistakes that showed they weren't
very serious, but any wakeup call is worth acting on.

> So if the same commits are just pushed to two remotes (gentoo and
> GitHub), then I should (in theory) be able to change my repo.conf
> settings, fiddle the remote in /usr/portage, and switch seamlessly from
> gentoo to GitHub? Alternatively, I could start with a clean /usr/portage
> again, once I'm happy that I have signature verification working on my
> machine.

As far as I can tell if you edit the repo URL in repos.conf and
probably also .git/config it should just seamlessly work, but I
haven't tried it.  Since it only accepts fast-forward pulls it
shouldn't do anything if the histories don't match.  If you do a sync
immediately before/after the change maybe you'll find that one repo is
behind the other and you just won't get any updates until the new repo
catches up, but I don't think portage will revert anything (that is an
advantage of git - it has a concept of directionality, though it looks
like portage is looking to add support to prevent replay attacks with
rsync as well).

> I do sync frequently (I'm a bit of an update enthusiast) -- at least
> once a week, though I prefer more often as I find that the longer I
> leave between syncs and world-updates, the more effort I have to
> overcome issues (few though they are). So git is a better fit for me, I
> think.

Honestly, I think git is a good fit for a lot of Gentoo users.  Yes,
it is different, but all the history/etc is the sort of thing I think
would appeal to many here.  Also, git is something that is becoming
increasingly unavoidable, and mostly for reasons that have universal
appeal.  Once you grok it you'll be using it everywhere.

Security is obviously getting a renewed focus across the board, so I
think we'll see improvements no matter how you use Gentoo, ideally
using defaults (for whatever reason git sig checking isn't a default
today).  Besides improving verification on the end-user side there is
also a lot of interest in improving security on the developer side,
and with infra (hardware tokens, maybe E2E signature checking, etc).
As usual this involves a certain amount of debate (authentication
isn't actually all that easy of a problem).


-- 
Rich



Re[4]: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Davyd McColl
@Rich: if I understand the process correctly, the same commits are 
pushed to infra and GitHub by the CI bot?


I ask because prior to the GitHub incident, I didn't have signature 
verification enabled (I hadn't read about it and it didn't even occur to 
me). So my plan was to (whilst GitHub was being sorted out) switch to 
the gentoo git repo and enable verification and, once I'd seen that that 
was working (because I'd also seen intermediate emails on this list from 
people having issues getting signing keys working), perhaps switch back 
to GitHub to put less strain on the Gentoo servers.


So if the same commits are just pushed to two remotes (gentoo and 
GitHub), then I should (in theory) be able to change my repo.conf 
settings, fiddle the remote in /usr/portage, and switch seamlessly from 
gentoo to GitHub? Alternatively, I could start with a clean /usr/portage 
again, once I'm happy that I have signature verification working on my 
machine.


I do sync frequently (I'm a bit of an update enthusiast) -- at least 
once a week, though I prefer more often as I find that the longer I 
leave between syncs and world-updates, the more effort I have to 
overcome issues (few though they are). So git is a better fit for me, I 
think.


-d

-- Original Message --
From: "Rich Freeman" 
To: gentoo-user@lists.gentoo.org
Sent: 2018-07-06 13:47:11
Subject: Re: Re[2]: [gentoo-user] Re: Portage, git and shallow cloning


On Fri, Jul 6, 2018 at 4:34 AM Davyd McColl  wrote:


I understand that git history will build over time -- I'm less 
concerned
with (eventual) disk usage than I am with the speed of `emerge 
--sync`,
which (and perhaps I'm sorely mistaken) appeared to be faster using 
git

than rsync -- hence my choice of git over rsync (the discussion at
https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be
alone in this experience).



From what I've generally seen/heard git is much more efficient as long
as you sync frequently.

rsync has the advantage that it only transfers the minimum necessary
to get you from the tree you have now to the tree that is current.  To
do this it has to stat every file (using default settings - you can
make it even slower if you want to), which is a lot of file I/O.

git has the advantage that it can just read the current HEAD and from
that know exactly what commits are missing, so there is way less
effort spent figuring out what changed.  It has the disadvantage that
it sends everything that happened since your last sync, which could
include files that were created and subsequently removed.  If you sync
often there won't be much of that, but if you're syncing monthly or
even less frequently then you probably will spend a lot of time
transmitting churn.

It is possible to trim down a repository, and as long as nobody is
doing force pushes on the main repo you should still be able to sync.
However, that is not something that just involves a git one-liner.
Personally I don't mind the space tradeoff, especially in exchange for
the IO tradeoff.  A sync is always a VERY fast operation.

I'll also note that the stable branch (which is always free of obvious
issues caused by devs not running repoman) is only available via git.
There is no reason that couldn't be replicated via rsync, but right
now we only have one set of mirrors.

I'm still syncing from github after enabling signature checking.
There is a patch that will make that more secure but in the meantime
my scripts keep an eye on exit status when I sync.  IMO signature
checking is more important than where you sync from - as long as gpg
says I'm good it really doesn't matter who has the ability to play
with the data enroute.  But, it certainly doesn't hurt to sync from
infra (I do have concerns for whether infra could handle everybody
doing it though - github is MS's problem to worry about).

--
Rich






Re: Re[2]: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Rich Freeman
On Fri, Jul 6, 2018 at 4:34 AM Davyd McColl  wrote:
>
> I understand that git history will build over time -- I'm less concerned
> with (eventual) disk usage than I am with the speed of `emerge --sync`,
> which (and perhaps I'm sorely mistaken) appeared to be faster using git
> than rsync -- hence my choice of git over rsync (the discussion at
> https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be
> alone in this experience).
>

>From what I've generally seen/heard git is much more efficient as long
as you sync frequently.

rsync has the advantage that it only transfers the minimum necessary
to get you from the tree you have now to the tree that is current.  To
do this it has to stat every file (using default settings - you can
make it even slower if you want to), which is a lot of file I/O.

git has the advantage that it can just read the current HEAD and from
that know exactly what commits are missing, so there is way less
effort spent figuring out what changed.  It has the disadvantage that
it sends everything that happened since your last sync, which could
include files that were created and subsequently removed.  If you sync
often there won't be much of that, but if you're syncing monthly or
even less frequently then you probably will spend a lot of time
transmitting churn.

It is possible to trim down a repository, and as long as nobody is
doing force pushes on the main repo you should still be able to sync.
However, that is not something that just involves a git one-liner.
Personally I don't mind the space tradeoff, especially in exchange for
the IO tradeoff.  A sync is always a VERY fast operation.

I'll also note that the stable branch (which is always free of obvious
issues caused by devs not running repoman) is only available via git.
There is no reason that couldn't be replicated via rsync, but right
now we only have one set of mirrors.

I'm still syncing from github after enabling signature checking.
There is a patch that will make that more secure but in the meantime
my scripts keep an eye on exit status when I sync.  IMO signature
checking is more important than where you sync from - as long as gpg
says I'm good it really doesn't matter who has the ability to play
with the data enroute.  But, it certainly doesn't hurt to sync from
infra (I do have concerns for whether infra could handle everybody
doing it though - github is MS's problem to worry about).

-- 
Rich



Re[2]: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Davyd McColl
Part of the original intent of the mail was just to bring to light the 
disparity between the documentation and experience (wrt the default 
value) -- I had no configured value and portage was trying to clone the 
entire history of the repo instead of a shallow start. Since I really 
appreciate the Gentoo documentation and have relied on it for 
installation and any system maintenance, I just wanted to bring this to 
light.


I understand that git history will build over time -- I'm less concerned 
with (eventual) disk usage than I am with the speed of `emerge --sync`, 
which (and perhaps I'm sorely mistaken) appeared to be faster using git 
than rsync -- hence my choice of git over rsync (the discussion at 
https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be 
alone in this experience).


Having the changelogs available also comes off as a positive for me -- 
I'm just plain curious.


-d

-- Original Message --
From: "Mick" 
To: gentoo-user@lists.gentoo.org
Sent: 2018-07-06 10:01:20
Subject: Re: [gentoo-user] Re: Portage, git and shallow cloning


On Friday, 6 July 2018 08:29:26 BST Martin Vaeth wrote:

Davyd McColl  wrote:
> 1) `sync-depth` has been deprecated (should now use `clone-depth`)

The reason is that sync-depth was meant to be effective for
every sync, i.e. that with sync-depth=1 the clone should stay shallow.
However, it turned out that this caused frequent/occassional errors
with git syncing when earlier chunks are needed.
So they decided to drop this, and the value is only used for the
initial cloning and ignored from then on. Due to this change of
effect, it has been renamed.
> 2) with the option missing, portage was fetching the entire history

Yes, but even with this option, your history will fill up over time.
Only the initial cloning will go faster and need less space.

> 2) I believe that the original intent of defaulting to a shallow 
clone was

> a good idea

Due to the point mentioned above, this is not very useful anymore.
Moreover, now that full checksumming is supported for rsync, the only
advantage of using git is that you get the history (in particular
ChangeLogs).


The lack of disk space on some of my systems, metered and slow 
bandwidth and
no need to know what every individual commit and reason for it was, had 
me

sticking to using rsync, after a short sting on using git.

I don't think anyone recommended git unless good reasons for one's use 
case

make it an optimal choice.
--
Regards,
Mick





Re: [gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Mick
On Friday, 6 July 2018 08:29:26 BST Martin Vaeth wrote:
> Davyd McColl  wrote:
> > 1) `sync-depth` has been deprecated (should now use `clone-depth`)
> 
> The reason is that sync-depth was meant to be effective for
> every sync, i.e. that with sync-depth=1 the clone should stay shallow.
> However, it turned out that this caused frequent/occassional errors
> with git syncing when earlier chunks are needed.
> So they decided to drop this, and the value is only used for the
> initial cloning and ignored from then on. Due to this change of
> effect, it has been renamed.
> 
> > 2) with the option missing, portage was fetching the entire history
> 
> Yes, but even with this option, your history will fill up over time.
> Only the initial cloning will go faster and need less space.
> 
> > 2) I believe that the original intent of defaulting to a shallow clone was
> > a good idea
> 
> Due to the point mentioned above, this is not very useful anymore.
> Moreover, now that full checksumming is supported for rsync, the only
> advantage of using git is that you get the history (in particular
> ChangeLogs).

The lack of disk space on some of my systems, metered and slow bandwidth and 
no need to know what every individual commit and reason for it was, had me 
sticking to using rsync, after a short sting on using git.

I don't think anyone recommended git unless good reasons for one's use case 
make it an optimal choice.
-- 
Regards,
Mick

signature.asc
Description: This is a digitally signed message part.


[gentoo-user] Re: Portage, git and shallow cloning

2018-07-06 Thread Martin Vaeth
Davyd McColl  wrote:
>
> 1) `sync-depth` has been deprecated (should now use `clone-depth`)

The reason is that sync-depth was meant to be effective for
every sync, i.e. that with sync-depth=1 the clone should stay shallow.
However, it turned out that this caused frequent/occassional errors
with git syncing when earlier chunks are needed.
So they decided to drop this, and the value is only used for the
initial cloning and ignored from then on. Due to this change of
effect, it has been renamed.

> 2) with the option missing, portage was fetching the entire history

Yes, but even with this option, your history will fill up over time.
Only the initial cloning will go faster and need less space.

> 2) I believe that the original intent of defaulting to a shallow clone was
> a good idea

Due to the point mentioned above, this is not very useful anymore.
Moreover, now that full checksumming is supported for rsync, the only
advantage of using git is that you get the history (in particular
ChangeLogs).