Rich Freeman <ri...@gentoo.org> wrote:
>> I was speaking about gentoo's git repository, of course
>> (the one which was attacked on github), not about a Frankensteined one
>> with metadata history filling megabytes of disk space unnecessarily.
>> Who has that much disk space to waste?
>
> Doesn't portage create that metadata anyway when you run it

You should better have it created by egencache in portage-postsyncd;
and even more you should download some other repositories as well
(news announcements, GLSA, dtd, xml-schema) which are maintained
independently, see e.g.
https://github.com/vaeth/portage-postsyncd-mv

It is the Gentoo way: Download only the sources and build it from there.
That's also a question of mentality and why I think most gentoo users
who use git would prefer that way.

> negating any space savings at the cost of CPU to regenerate the cache?

It's the *history* of the metadata which matters here:
Since every changed metadata file requires a fraction of a second,
one can estimate rather well that several ten thousand files are
changed hourly/daily/weekly (the frequency depending mainly on eclass
changes: One change in some eclass requires a change for practically
every version of every package) so that the history of metadata changed
produced by this over time is enormous. This history, of course,
is completely useless and stored completely in vain.
One of the weaknesses of git is that it is impossible, by design,
to omit such superfluous history selectively (once the files *are*
maintained by git).

>> For the official git repository your assertions are simply false,
>> as you apprently admit: It is currently not possible to use the
>> official git repo (or the github clone of it which was attacked)
>> in a secure manner.
>
> Sure, but this also doesn't support signature verification at all
> [...] so your points still don't apply.

Hu? This actually *was* my point.

BTW, portage might easily support signature verification if just
distribution of the developers' public keys would be properly
maintained (e.g. via gkeys or simpler via some package):
After all, gentoo infra should always have an up-to-date list of
these keys anyway.
(If they don't, it would make it even more important to use the
source repo instead of trusting a signature which is given
without sufficient verification)

>> Your implicit claim is untrue. rsync - as used by portage - always
>> transfers whole files, only.
>
> rsync is capable of transferring partial files.

Yes, and portage is explicitly disabling this. (It costs a lot of
server CPU time and does not save much transfer data if the files
are small, because a lot of hashes have to be transferred
(and calculated - CPU-time!) instead.)

> However, this is based on offsets from the start of the file

There are new algorithms which support also detection of insertions
and deletions via rolling hashes (e.g. for deduplicating filesystems).
Rsync is using quite an advanced algorithm as well, but I would
need to recheck its features.

Anyway, it plays no role for our discussion, because for such
small files it hardly matters, and portage is disabling
said algorithm anyway.

> "The council does not require that ChangeLogs be generated or
>   distributed through the rsync system. It is at the discretion of our
>   infrastructure team whether or not this service continues."

The formulation already makes it clear that one did not want to
put pressure on infra, and at that time it was expected that
every user would switch to git anyway.
At that time also the gkeys project was very active, and git was
(besides webrsync) the only expected way to get checksums for the
full tree. In particular, rsync was inherently insecure.

The situation has changed meanwhile on both sides: gkeys was
apparently practically abandoned, and instead gemato was introduced
and is actively supported. That suddenly the gentoo-mirror repository
is more secure than the git repository is also a side effect of
gemato, because only for this the infra keys are now suddenly
distributed in a package.

> If you're using squashfs git pull probably isn't the right solution for you.

Exactly. That's why I completely disagree with portage's regression
of replacing the previously working solution by the only partially
working "git pull".

>> 4. Even if the user made the mistake to edit a file, portage should
>>    not just die on syncing.
>
> emerge --sync won't die in a situation like in general.

It does: git push refuses to start if there are uncommitted changes.

> but I don't think the correct default in this case should be
> to just wipe out the user's changes.

I do: Like for rsync a user should not do changes to the distributed
tree (unless he makes a PR) but in an overlay; otherwise he will
permanently have outdated files which are not correctly updated.
*If* a user wants such changes, he should correctly use git and commit.

But I am not against to make this an opt-in option for enabling it
by a developer (or advanced user) who is afraid to eventually lose
a change for the case that he forgot to commit before syncing.

Anyway, this has nothing to do with "git pull" vs. "git fetch + git reset",
but is only a question whether the option "--hard" or "--merge" should be
used for "git reset".

One certainly could also live with "reset --merge" as the default (or even
only option), as it was previously in portage, but the change to "pull"
was IMHO a regression.


Reply via email to