On Tue, 2020-06-30 at 12:50 -0500, Sid Spry wrote: > On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote: > > Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <s...@aeam.us> napisał(a): > > > Hello, > > > > > > I have some runnable pseudocode outlining a faster tree verification > > > algorithm. > > > Before I create patches I'd like to see if there is any guidance on > > > making the > > > changes as unobtrusive as possible. If the radical change in algorithm > > > is > > > acceptable I can work on adding the changes. > > > > > > Instead of composing any kind of structured data out of the portage > > > tree my > > > algorithm just lists all files and then optionally batches them out to > > > threads. > > > There is a noticeable speedup by eliding the tree traversal operations > > > which > > > can be seen when running the algorithm with a single thread and > > > comparing it to > > > the current algorithm in gemato (which should still be discussed > > > here?). > > > > Without reading the code: does your algorithm correctly detect extraneous > > files? > > > > Yes and no. > > I am not sure why this is necessary. If the file does not appear in a > manifest it is > ignored. It makes the most sense to me to put the burden of not including > untracked files on the publisher. If the user puts an untracked file into the > tree it > will be ignored to no consequence; the authored files don't refer to it, > after all.
This is necessary because a malicious third party can MITM you an rsync tree with extraneous files (say, -r1 baselayout ebuild) that do horrible things on your system. If you don't reject files not in Manifest, you open a huge security hole. > But it would be easy enough to build a second list of all files and compare > it to > the list of files built from the manifests. If there are extras an error can > be > generated. This is actually the first test I did on my manifest parsing code. > I tried > to see if my tracked files roughly matched the total files in tree. That can > be > repurposed for this check. > > > > Some simple tests like counting all objects traversed and verified > > > returns the > > > same(ish). Once it is put into portage it could be tested in detail. > > > > > > There is also my partial attempt at removing the brittle interface to > > > GnuPG > > > (it's not as if the current code is badly designed, just that parsing > > > the > > > output of GnuPG directly is likely not the best idea). > > > > The 'brittle interface' is well-defined machine-readable output. > > > > Ok. I was aware there was a machine interface, but the classes that manipulate > a temporary GPG home seemed like not the best solution. I guess that is all > due to GPG assuming everything is in ~/.gnupg and keeping its state as a > directory structure. A temporary home directory guarantees that user configuration does not affect the verification result. > > > > Needs gemato, dnspython, and requests. Slightly better than random code > > > because > > > I took inspiration from the existing gemato classes. > > > > The code makes a lot of brittle assumptions about the structure. The > > GLEP was specifically designed to avoid that and let us adjust the > > structure in the future to meet our needs. > > > > These same assumptions are built into the code that operates on the > tree structure. If the GLEP were changed the existing code would also > potentially need changing. This code just uses the structure in a different > way. > The code that predates the GLEP, yes. It will eventually be changed to be more flexible, especially when we can assume that we start removing backwards compatibility. -- Best regards, Michał Górny
signature.asc
Description: This is a digitally signed message part