[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Rich Freeman wrote: > emerge --sync works just fine if > there are uncommitted changes in your repository, whether they are > indexed or otherwise. You are right. It seems to be somewhat "random" when git pull refuses to work and when not. I could not detect a common scheme. Maybe this has mainly to do with using overlayfs and git becoming confused.
Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
On Sun, Jul 8, 2018 at 4:28 AM Martin Vaeth wrote: > > Rich Freeman wrote: > > It's the *history* of the metadata which matters here: You make a reasonable point here. > > "The council does not require that ChangeLogs be generated or > > distributed through the rsync system. It is at the discretion of our > > infrastructure team whether or not this service continues." > > The formulation already makes it clear that one did not want to > put pressure on infra, and at that time it was expected that > every user would switch to git anyway. The use of git for history, and yes, in general the Council tries not to forbid projects from providing services. The intent was to communicate that it was simply not an expectation that they do so. > At that time also the gkeys project was very active, and git was > (besides webrsync) the only expected way to get checksums for the > full tree. In particular, rsync was inherently insecure. Honestly, I don't think gkeys really played any part in this, but there was definitely an intent for signature checking in the tree to become more robust. As you point out (in a part I trimmed) it ought to be possible to do this. Indeed, git support for signing commits was considered a requirement for git implementation. > >> 4. Even if the user made the mistake to edit a file, portage should > >>not just die on syncing. > > > > emerge --sync won't die in a situation like in general. > > It does: git push refuses to start if there are uncommitted changes. > I did a test before I made my post. emerge --sync works just fine if there are uncommitted changes in your repository, whether they are indexed or otherwise. I didn't test merge conflicts but I'd hope it would fail if these exist. -- Rich
[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Rich Freeman wrote: >> I was speaking about gentoo's git repository, of course >> (the one which was attacked on github), not about a Frankensteined one >> with metadata history filling megabytes of disk space unnecessarily. >> Who has that much disk space to waste? > > Doesn't portage create that metadata anyway when you run it You should better have it created by egencache in portage-postsyncd; and even more you should download some other repositories as well (news announcements, GLSA, dtd, xml-schema) which are maintained independently, see e.g. https://github.com/vaeth/portage-postsyncd-mv It is the Gentoo way: Download only the sources and build it from there. That's also a question of mentality and why I think most gentoo users who use git would prefer that way. > negating any space savings at the cost of CPU to regenerate the cache? It's the *history* of the metadata which matters here: Since every changed metadata file requires a fraction of a second, one can estimate rather well that several ten thousand files are changed hourly/daily/weekly (the frequency depending mainly on eclass changes: One change in some eclass requires a change for practically every version of every package) so that the history of metadata changed produced by this over time is enormous. This history, of course, is completely useless and stored completely in vain. One of the weaknesses of git is that it is impossible, by design, to omit such superfluous history selectively (once the files *are* maintained by git). >> For the official git repository your assertions are simply false, >> as you apprently admit: It is currently not possible to use the >> official git repo (or the github clone of it which was attacked) >> in a secure manner. > > Sure, but this also doesn't support signature verification at all > [...] so your points still don't apply. Hu? This actually *was* my point. BTW, portage might easily support signature verification if just distribution of the developers' public keys would be properly maintained (e.g. via gkeys or simpler via some package): After all, gentoo infra should always have an up-to-date list of these keys anyway. (If they don't, it would make it even more important to use the source repo instead of trusting a signature which is given without sufficient verification) >> Your implicit claim is untrue. rsync - as used by portage - always >> transfers whole files, only. > > rsync is capable of transferring partial files. Yes, and portage is explicitly disabling this. (It costs a lot of server CPU time and does not save much transfer data if the files are small, because a lot of hashes have to be transferred (and calculated - CPU-time!) instead.) > However, this is based on offsets from the start of the file There are new algorithms which support also detection of insertions and deletions via rolling hashes (e.g. for deduplicating filesystems). Rsync is using quite an advanced algorithm as well, but I would need to recheck its features. Anyway, it plays no role for our discussion, because for such small files it hardly matters, and portage is disabling said algorithm anyway. > "The council does not require that ChangeLogs be generated or > distributed through the rsync system. It is at the discretion of our > infrastructure team whether or not this service continues." The formulation already makes it clear that one did not want to put pressure on infra, and at that time it was expected that every user would switch to git anyway. At that time also the gkeys project was very active, and git was (besides webrsync) the only expected way to get checksums for the full tree. In particular, rsync was inherently insecure. The situation has changed meanwhile on both sides: gkeys was apparently practically abandoned, and instead gemato was introduced and is actively supported. That suddenly the gentoo-mirror repository is more secure than the git repository is also a side effect of gemato, because only for this the infra keys are now suddenly distributed in a package. > If you're using squashfs git pull probably isn't the right solution for you. Exactly. That's why I completely disagree with portage's regression of replacing the previously working solution by the only partially working "git pull". >> 4. Even if the user made the mistake to edit a file, portage should >>not just die on syncing. > > emerge --sync won't die in a situation like in general. It does: git push refuses to start if there are uncommitted changes. > but I don't think the correct default in this case should be > to just wipe out the user's changes. I do: Like for rsync a user should not do changes to the distributed tree (unless he makes a PR) but in an overlay; otherwise he will permanently have outdated files which are not correctly updated. *If* a user wants such changes, he should correctly use git and commit. But I am not against to make this an opt-in option for enabling it by a developer (or advanced
Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
On Sat, Jul 7, 2018 at 5:29 PM Martin Vaeth wrote: > > Rich Freeman wrote: > > On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth wrote: > >> > >> Biggest issue is that git signature happens by the developer who > >> last commited which means that in practice you need dozens/hundreds > >> of keys. > > > > This is untrue. [...] > > It will, of course, not work on the regular git repo [...] > > You need to use a repo that is signed by infra > > (which typically includes metadata/etc as well). > > I was speaking about gentoo's git repository, of course > (the one which was attacked on github), not about a Frankensteined one > with metadata history filling megabytes of disk space unnecessarily. > Who has that much disk space to waste? Doesn't portage create that metadata anyway when you run it, negating any space savings at the cost of CPU to regenerate the cache? > > For the official git repository your assertions are simply false, > as you apprently admit: It is currently not possible to use the > official git repo (or the github clone of it which was attacked) > in a secure manner. > Sure, but this also doesn't support signature verification at all (at least not by portage - git can of course manually verify any commit), so your points still don't apply. > > and as a bonus they want them prepended to > > instead of appended so that rsync resends the whole thing instead of > > just the tail... > > Your implicit claim is untrue. rsync - as used by portage - always > transfers whole files, only. rsync is capable of transferring partial files. I can't vouch for how portage is using it, but both the rsync command line program and librsync can do partial file transfers. However, this is based on offsets from the start of the file, so appending to a file will result in the first part of the file being identical, but prepending will break rsync's algorithm. > > > But, this was endlessly debated before the decision was made. > > The decision was about removing the ChangeLogs from the git > repository. This was certainly the correct decision, because - > as you said - the ChangeLogs *can* be regenerated from the > git history and thus it makes no sense to modify/store them > redundantly. There were two decisions: https://projects.gentoo.org/council/meeting-logs/20141014-summary.txt "do we need to continue to create new ChangeLog entries once we're operating in git?" No. https://projects.gentoo.org/council/meeting-logs/20160410-summary.txt "The council does not require that ChangeLogs be generated or distributed through the rsync system. It is at the discretion of our infrastructure team whether or not this service continues." Accepted (4 yes, 1 no, 2 abstention) > > It probably should be a configurable option in repos.conf, but > > honestly, forced pushes are not something that should be considered a > > good practice. > > 1. portage shouldn't decide about practices of overlays. Hence the reason I suggested it should be a repos.conf option. > 2. also in the official gentoo repository force pushes happen >occassionally. Last occurrence was e.g. when undoing the >malevolent forced push ;) Sure, but that was a fast-forward from the last good commit, so it wouldn't require a force pull unless a user had done a force pull on the bad repo. > 3. git pull fails not only for forced pushes but also in several >other occassions; for instance, if your filesystem changed inodes >numbers (e.g. squash + overlayfs after a resquash+remount). If you're using squashfs git pull probably isn't the right solution for you. > 4. Even if the user made the mistake to edit a file, portage should >not just die on syncing. emerge --sync won't die in a situation like in general. Maybe it will if there is a merge conflict, but I don't think the correct default in this case should be to just wipe out the user's changes. I'm all for making that an option, however. -- Rich
[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Rich Freeman wrote: > On Sat, Jul 7, 2018 at 1:51 AM Martin Vaeth wrote: >> Davyd McColl wrote: >> >> > I ask because prior to the GitHub incident, I didn't have signature >> > verification enabled >> >> Currently, it is not practical to change this, see my other posting. > > You clearly don't understand what it actually checks. Davyd and I were obviously speaking about the gentoo repository (the official one and the one on github which got hacked). For these repositories verification is practically not possible. (That there are also *other* repositories - with huge metadata history - which might be easier to verify is a different story). Perversely, the official comments after the hack had suggested that you should have enabled signature verification for the hacked repository which was simply practically not possible.
[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Rich Freeman wrote: > On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth wrote: >> >> Biggest issue is that git signature happens by the developer who >> last commited which means that in practice you need dozens/hundreds >> of keys. > > This is untrue. [...] > It will, of course, not work on the regular git repo [...] > You need to use a repo that is signed by infra > (which typically includes metadata/etc as well). I was speaking about gentoo's git repository, of course (the one which was attacked on github), not about a Frankensteined one with metadata history filling megabytes of disk space unnecessarily. Who has that much disk space to waste? For the official git repository your assertions are simply false, as you apprently admit: It is currently not possible to use the official git repo (or the github clone of it which was attacked) in a secure manner. >> > unless you stick --force in your pull >> >> Unfortunately, it is not that simple: git pull --force only works if > [...] > You completely trimmed the context around my quote. [...] > they simply would not be pulled without --force. I was saying that they would not be pulled *with* --force either, because pull --force is not as strong as you think it is (it would have shown you conflicts to resolve manually). You would have to use the commands that I have posted. > You seem to be providing advice for how to do a pull with a shallow > repository No, what I said is not related to a shallow repository. It has to do with pulling a forced push, in general. >> At least since the ChangeLogs have been removed. >> IMHO it was the wrong decision to not keep them in the rsync tree >> (The tool to regenerate them from git was/is available). > > Changelogs are redundant with git, and they take a ton of space (which > of late everybody seems to be super-concerned about) Compared to the git history, they take very little space. If you squash the portage tree, it is hardly measurable. And with the ChangeLogs, rsync would still be a sane option for most users. Without ChangeLogs many users are unnecessarily forced to change and to sacrifice the space for git history. > and as a bonus they want them prepended to > instead of appended so that rsync resends the whole thing instead of > just the tail... Your implicit claim is untrue. rsync - as used by portage - always transfers whole files, only. > But, this was endlessly debated before the decision was made. The decision was about removing the ChangeLogs from the git repository. This was certainly the correct decision, because - as you said - the ChangeLogs *can* be regenerated from the git history and thus it makes no sense to modify/store them redundantly. But I was speaking about the distribution of ChangeLogs in rsync: Whenever the infrastructure uses egencache to generate the metadata, it could simply pass --update-changelogs so that rsync users still would have ChangeLogs: They cannot get them from git history. > My > point is that the sorts of people who like Gentoo would probably tend > to like git. "Liking" git does not mean that one has to use it also for things for which it brings nothing. And for most users none of its features is useful for the portage tree. With one exception: ChangeLogs. That's why I am adverising to bring them back to the rsync tree. > The "keys problem" has nothing to do with the security of git > verification, because those keys are not used by git verification on > the end-user side. Whoever is that git/developer affine that he prefers git despite it costs more disk space will certainly want to use the actual source repository and not a worse rsync-clone repository. > It probably should be a configurable option in repos.conf, but > honestly, forced pushes are not something that should be considered a > good practice. 1. portage shouldn't decide about practices of overlays. 2. also in the official gentoo repository force pushes happen occassionally. Last occurrence was e.g. when undoing the malevolent forced push ;) 3. git pull fails not only for forced pushes but also in several other occassions; for instance, if your filesystem changed inodes numbers (e.g. squash + overlayfs after a resquash+remount). 4. Even if the user made the mistake to edit a file, portage should not just die on syncing.
Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
On Sat, Jul 7, 2018 at 1:34 AM Martin Vaeth wrote: > > Rich Freeman wrote: > > > > Biggest issue with git signature verification is that right now it > > will still do a full pull/checkout before verifying > > Biggest issue is that git signature happens by the developer who > last commited which means that in practice you need dozens/hundreds > of keys. This is untrue. The last git signature is made by infra or the CI-bot, and this is the signature that portage checks. Portage will NOT accept a developer key, or any other key in your keychain, as being valid. It will, of course, not work on the regular git repo used for committing for this reason. You need to use a repo that is signed by infra (which typically includes metadata/etc as well). I'll trim most of the rest of your email and only reply to significant bits, because you seem to not understand the point above which invalidates almost everything you wrote. The concerns you raise would be an issue if you were checking individual developer keys. > > So currently, it is impossible to do *any* automatic tree verification, > unless you manually fetch/update all of the developer keys. > As noted, you don't need to fetch any developer keys, and if you do fetch them, portage will ignore them. > > > unless you stick --force in your pull > > Unfortunately, it is not that simple: git pull --force only works if > the checked out tree is old enough (in which case git pull without --force > would have worked also, BTW). You completely trimmed the context around my quote. I was talking about the malicious commits in the recent attack. They were force-pushed, so it doesn't matter how complete your repository is - they simply would not be pulled without --force. You seem to be providing advice for how to do a pull with a shallow repository, which I'm not talking about. > > Honestly, I think git is a good fit for a lot of Gentoo users. > > At least since the ChangeLogs have been removed. > IMHO it was the wrong decision to not keep them in the rsync tree > (The tool to regenerate them from git was/is available). Changelogs are redundant with git, and they take a ton of space (which of late everybody seems to be super-concerned about). I don't get that on one hand people get twitchy about /usr/portage taking more than 1GB, and on the other hand they want a bazillion text files dumped all over the place, and as a bonus they want them prepended to instead of appended so that rsync resends the whole thing instead of just the tail... But, this was endlessly debated before the decision was made. Trust me, I read every post before voting to have them removed. > > > it is different, but all the history/etc is the sort of thing I think > > would appeal to many here. > > Having the ChangeLogs would certainly be sufficient for the majority > of users. It is very rare that a user really needs to access the > older version of the file, and in that case it is simple enough > to fetch it manually from e.g. github. It is very rare that somebody would want to use Gentoo at all. My point is that the sorts of people who like Gentoo would probably tend to like git. But, to each their own... > > > Security is obviously getting a renewed focus across the board > > Unfortunately, due to the mentioned keys problem, git is > currently the *unsafest* method for syncing. The "keys problem" has nothing to do with the security of git verification, because those keys are not used by git verification on the end-user side. An infra-controlled key is used for verification whether you sync with git or rsync. Either way you're relying on infra checking the developer keys at time of commit. Now, as I already mentioned git syncing is currently less safe due to it doing the checkout before the verification, and they are in the process of fixing this. > (BTW, due to the number of committers the portage tree has a quite > strict policy w.r.t. forced pushes. Overlays, especially of single > users, might have different policies and thus can fail quite often > due to the "git pull" bug.) It probably should be a configurable option in repos.conf, but honestly, forced pushes are not something that should be considered a good practice. There are times that it is the best option, but those are rare, IMO. -- Rich
Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
On Sat, Jul 7, 2018 at 1:51 AM Martin Vaeth wrote: > > Davyd McColl wrote: > > > I ask because prior to the GitHub incident, I didn't have signature > > verification enabled > > Currently, it is not practical to change this, see my other posting. > You clearly don't understand what it actually checks. It is completely practical to enable this today (though not as secure as it could be). I'll elaborate in a reply to the other email. -- Rich
[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Davyd McColl wrote: > @Rich: if I understand the process correctly, the same commits are > pushed to infra and GitHub by the CI bot? Yes, the repositories are always identical (up to a few seconds delay). > I ask because prior to the GitHub incident, I didn't have signature > verification enabled Currently, it is not practical to change this, see my other posting. > then I should (in theory) be able to change my repo.conf > settings, fiddle the remote in /usr/portage, and switch seamlessly from > gentoo to GitHub? If by "fiddle the remote in /usr/portage" you mean to edit the .git/config file you are right. Note that just changing the remote in repos.conf has only any effect if you completely removed /usr/portage, and portage has to clone anew.
[gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Rich Freeman wrote: > > Biggest issue with git signature verification is that right now it > will still do a full pull/checkout before verifying Biggest issue is that git signature happens by the developer who last commited which means that in practice you need dozens/hundreds of keys. No package is available for this, and the only tool which I know which was originally developed to manage these (app-crypt/gkeys) is not ready for usage for verifaction (gkeys-gpg --verify was apparently never run by its developer since its python code breaks already for argument parsing), and its developmant has stalled. Moreover, although I have written a dirty substitute for gkeys-gpg, it is not clear how to use gkeys to update signatures and remove staled ones: It appears that for each usage you have to fetch all seeds and keys anew. (And I am not even sure whether the seeds it fetches are really still maintained). So currently, it is impossible to do *any* automatic tree verification, unless you manually fetch/update all of the developer keys. Safest bet if you are a git user is to verify manually whether the "Verify" field of the latest commit in github really belongs to a gentoo devloper and is not a fake account. (Though that may be hard to decide.) > until the patch makes its way into release (the patch will do a fetch > and verify before it does a checkout This helps nothing to get all the correct keys (and no fake keys!) you need to verify the signature. > unless you stick --force in your pull Unfortunately, it is not that simple: git pull --force only works if the checked out tree is old enough (in which case git pull without --force would have worked also, BTW). The correct thing to do if git pull failed is: git update-index --refresh -q --unmerged # -q is important here! git fetch git reset --hard $(git rev-parse --abbrev-ref \ --symbolic-full-name @{upstream}) (The first command is needed to get rid of problems caused by filesystems like overlayfs). (If you are a developer and do not want to risk that syncing overrides your uncommited changes, you might want to replace --hard by --merge). > not a great idea for scripts and portage doesn't do this). I think it is a very great idea. In fact, portage did do this previously *always* (with --merge instead of --hard) and the only reason this was removed is that the git update-index --refresh -q --unmerge takes quite some time which is not necessary for people who do not use a special filesystem like overlayfs for the portage tree. The right thing to do IMHO is that portage would use this anyway as a fallback if "git pull" fails. I usually patch portage to do this. > that was just dumb luck Exactly. That's why using "git pull" should not be considered as a security measurement. It is only a safety measurement if you are a developer and want to avoid loosing local changes at any price if you mistakenly sync before committing (although the mentioned --merge instead of --hard should be safe here, too). > Honestly, I think git is a good fit for a lot of Gentoo users. At least since the ChangeLogs have been removed. IMHO it was the wrong decision to not keep them in the rsync tree (The tool to regenerate them from git was/is available). > it is different, but all the history/etc is the sort of thing I think > would appeal to many here. Having the ChangeLogs would certainly be sufficient for the majority of users. It is very rare that a user really needs to access the older version of the file, and in that case it is simple enough to fetch it manually from e.g. github. > Also, git is something that is becoming increasingly unavoidable If you learn something about git from using it through portage, this only indicates a bug in portage. (Like e.g. using "git pull" is). > Security is obviously getting a renewed focus across the board Unfortunately, due to the mentioned keys problem, git is currently the *unsafest* method for syncing. The "git pull" bug of portage is not appealing for normal usage, either. (BTW, due to the number of committers the portage tree has a quite strict policy w.r.t. forced pushes. Overlays, especially of single users, might have different policies and thus can fail quite often due to the "git pull" bug.)