Rich Freeman <ri...@gentoo.org> wrote: >> I was speaking about gentoo's git repository, of course >> (the one which was attacked on github), not about a Frankensteined one >> with metadata history filling megabytes of disk space unnecessarily. >> Who has that much disk space to waste? > > Doesn't portage create that metadata anyway when you run it
You should better have it created by egencache in portage-postsyncd; and even more you should download some other repositories as well (news announcements, GLSA, dtd, xml-schema) which are maintained independently, see e.g. https://github.com/vaeth/portage-postsyncd-mv It is the Gentoo way: Download only the sources and build it from there. That's also a question of mentality and why I think most gentoo users who use git would prefer that way. > negating any space savings at the cost of CPU to regenerate the cache? It's the *history* of the metadata which matters here: Since every changed metadata file requires a fraction of a second, one can estimate rather well that several ten thousand files are changed hourly/daily/weekly (the frequency depending mainly on eclass changes: One change in some eclass requires a change for practically every version of every package) so that the history of metadata changed produced by this over time is enormous. This history, of course, is completely useless and stored completely in vain. One of the weaknesses of git is that it is impossible, by design, to omit such superfluous history selectively (once the files *are* maintained by git). >> For the official git repository your assertions are simply false, >> as you apprently admit: It is currently not possible to use the >> official git repo (or the github clone of it which was attacked) >> in a secure manner. > > Sure, but this also doesn't support signature verification at all > [...] so your points still don't apply. Hu? This actually *was* my point. BTW, portage might easily support signature verification if just distribution of the developers' public keys would be properly maintained (e.g. via gkeys or simpler via some package): After all, gentoo infra should always have an up-to-date list of these keys anyway. (If they don't, it would make it even more important to use the source repo instead of trusting a signature which is given without sufficient verification) >> Your implicit claim is untrue. rsync - as used by portage - always >> transfers whole files, only. > > rsync is capable of transferring partial files. Yes, and portage is explicitly disabling this. (It costs a lot of server CPU time and does not save much transfer data if the files are small, because a lot of hashes have to be transferred (and calculated - CPU-time!) instead.) > However, this is based on offsets from the start of the file There are new algorithms which support also detection of insertions and deletions via rolling hashes (e.g. for deduplicating filesystems). Rsync is using quite an advanced algorithm as well, but I would need to recheck its features. Anyway, it plays no role for our discussion, because for such small files it hardly matters, and portage is disabling said algorithm anyway. > "The council does not require that ChangeLogs be generated or > distributed through the rsync system. It is at the discretion of our > infrastructure team whether or not this service continues." The formulation already makes it clear that one did not want to put pressure on infra, and at that time it was expected that every user would switch to git anyway. At that time also the gkeys project was very active, and git was (besides webrsync) the only expected way to get checksums for the full tree. In particular, rsync was inherently insecure. The situation has changed meanwhile on both sides: gkeys was apparently practically abandoned, and instead gemato was introduced and is actively supported. That suddenly the gentoo-mirror repository is more secure than the git repository is also a side effect of gemato, because only for this the infra keys are now suddenly distributed in a package. > If you're using squashfs git pull probably isn't the right solution for you. Exactly. That's why I completely disagree with portage's regression of replacing the previously working solution by the only partially working "git pull". >> 4. Even if the user made the mistake to edit a file, portage should >> not just die on syncing. > > emerge --sync won't die in a situation like in general. It does: git push refuses to start if there are uncommitted changes. > but I don't think the correct default in this case should be > to just wipe out the user's changes. I do: Like for rsync a user should not do changes to the distributed tree (unless he makes a PR) but in an overlay; otherwise he will permanently have outdated files which are not correctly updated. *If* a user wants such changes, he should correctly use git and commit. But I am not against to make this an opt-in option for enabling it by a developer (or advanced user) who is afraid to eventually lose a change for the case that he forgot to commit before syncing. Anyway, this has nothing to do with "git pull" vs. "git fetch + git reset", but is only a question whether the option "--hard" or "--merge" should be used for "git reset". One certainly could also live with "reset --merge" as the default (or even only option), as it was previously in portage, but the change to "pull" was IMHO a regression.