On Tue, 2020-06-30 at 12:50 -0500, Sid Spry wrote:
> On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote:
> > Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <s...@aeam.us> napisał(a):
> > > Hello,
> > > 
> > > I have some runnable pseudocode outlining a faster tree verification
> > > algorithm.
> > > Before I create patches I'd like to see if there is any guidance on
> > > making the
> > > changes as unobtrusive as possible. If the radical change in algorithm
> > > is
> > > acceptable I can work on adding the changes.
> > > 
> > > Instead of composing any kind of structured data out of the portage
> > > tree my
> > > algorithm just lists all files and then optionally batches them out to
> > > threads.
> > > There is a noticeable speedup by eliding the tree traversal operations
> > > which
> > > can be seen when running the algorithm with a single thread and
> > > comparing it to
> > > the current algorithm in gemato (which should still be discussed
> > > here?).
> > 
> > Without reading the code: does your algorithm correctly detect extraneous 
> > files?
> > 
> 
> Yes and no.
> 
> I am not sure why this is necessary. If the file does not appear in a 
> manifest it is
> ignored. It makes the most sense to me to put the burden of not including
> untracked files on the publisher. If the user puts an untracked file into the 
> tree it
> will be ignored to no consequence; the authored files don't refer to it, 
> after all.

This is necessary because a malicious third party can MITM you an rsync
tree with extraneous files (say, -r1 baselayout ebuild) that do horrible
things on your system.  If you don't reject files not in Manifest, you
open a huge security hole.

> But it would be easy enough to build a second list of all files and compare 
> it to
> the list of files built from the manifests. If there are extras an error can 
> be
> generated. This is actually the first test I did on my manifest parsing code. 
> I tried
> to see if my tracked files roughly matched the total files in tree. That can 
> be
> repurposed for this check.
> 
> > > Some simple tests like counting all objects traversed and verified
> > > returns the
> > > same(ish). Once it is put into portage it could be tested in detail.
> > > 
> > > There is also my partial attempt at removing the brittle interface to
> > > GnuPG
> > > (it's not as if the current code is badly designed, just that parsing
> > > the
> > > output of GnuPG directly is likely not the best idea).
> > 
> > The 'brittle interface' is well-defined machine-readable output.
> > 
> 
> Ok. I was aware there was a machine interface, but the classes that manipulate
> a temporary GPG home seemed like not the best solution. I guess that is all
> due to GPG assuming everything is in ~/.gnupg and keeping its state as a
> directory structure.

A temporary home directory guarantees that user configuration does not
affect the verification result.

> 
> > > Needs gemato, dnspython, and requests. Slightly better than random code
> > > because
> > > I took inspiration from the existing gemato classes.
> > 
> > The code makes a lot of brittle assumptions about the structure. The 
> > GLEP was specifically designed to avoid that and let us adjust the 
> > structure in the future to meet our needs.
> > 
> 
> These same assumptions are built into the code that operates on the
> tree structure. If the GLEP were changed the existing code would also
> potentially need changing. This code just uses the structure in a different
> way.
> 

The code that predates the GLEP, yes.  It will eventually be changed to
be more flexible, especially when we can assume that we start removing
backwards compatibility.

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to