On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote:
> Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <s...@aeam.us> napisał(a):
> >Hello,
> >
> >I have some runnable pseudocode outlining a faster tree verification
> >algorithm.
> >Before I create patches I'd like to see if there is any guidance on
> >making the
> >changes as unobtrusive as possible. If the radical change in algorithm
> >is
> >acceptable I can work on adding the changes.
> >
> >Instead of composing any kind of structured data out of the portage
> >tree my
> >algorithm just lists all files and then optionally batches them out to
> >threads.
> >There is a noticeable speedup by eliding the tree traversal operations
> >which
> >can be seen when running the algorithm with a single thread and
> >comparing it to
> >the current algorithm in gemato (which should still be discussed
> >here?).
> 
> Without reading the code: does your algorithm correctly detect extraneous 
> files?
> 

Yes and no.

I am not sure why this is necessary. If the file does not appear in a manifest 
it is
ignored. It makes the most sense to me to put the burden of not including
untracked files on the publisher. If the user puts an untracked file into the 
tree it
will be ignored to no consequence; the authored files don't refer to it, after 
all.

But it would be easy enough to build a second list of all files and compare it 
to
the list of files built from the manifests. If there are extras an error can be
generated. This is actually the first test I did on my manifest parsing code. I 
tried
to see if my tracked files roughly matched the total files in tree. That can be
repurposed for this check.

> >Some simple tests like counting all objects traversed and verified
> >returns the
> >same(ish). Once it is put into portage it could be tested in detail.
> >
> >There is also my partial attempt at removing the brittle interface to
> >GnuPG
> >(it's not as if the current code is badly designed, just that parsing
> >the
> >output of GnuPG directly is likely not the best idea).
> 
> The 'brittle interface' is well-defined machine-readable output.
>

Ok. I was aware there was a machine interface, but the classes that manipulate
a temporary GPG home seemed like not the best solution. I guess that is all
due to GPG assuming everything is in ~/.gnupg and keeping its state as a
directory structure.

> >
> >Needs gemato, dnspython, and requests. Slightly better than random code
> >because
> >I took inspiration from the existing gemato classes.
> 
> The code makes a lot of brittle assumptions about the structure. The 
> GLEP was specifically designed to avoid that and let us adjust the 
> structure in the future to meet our needs.
> 

These same assumptions are built into the code that operates on the
tree structure. If the GLEP were changed the existing code would also
potentially need changing. This code just uses the structure in a different
way.

I will admit my partial understanding of the entire GLEP. I made some
simplifications just to get something demonstrable done. However, please
consider removing or putting some of the checks elsewhere. I don't have
full suggestions right now, but there is the possibility of saving an
appreciable amount of time.

Reply via email to