On Wed, Jul 1, 2020, at 1:40 AM, Fabian Groffen wrote:
> On 30-06-2020 13:13:29 -0500, Sid Spry wrote:
> > On Tue, Jun 30, 2020, at 1:20 AM, Fabian Groffen wrote:
> > > Hi,
> > > 
> > > On 29-06-2020 21:13:43 -0500, Sid Spry wrote:
> > > > Hello,
> > > > 
> > > > I have some runnable pseudocode outlining a faster tree verification 
> > > > algorithm.
> > > > Before I create patches I'd like to see if there is any guidance on 
> > > > making the
> > > > changes as unobtrusive as possible. If the radical change in algorithm 
> > > > is
> > > > acceptable I can work on adding the changes.
> > > > 
> > > > Instead of composing any kind of structured data out of the portage 
> > > > tree my
> > > > algorithm just lists all files and then optionally batches them out to 
> > > > threads.
> > > > There is a noticeable speedup by eliding the tree traversal operations 
> > > > which
> > > > can be seen when running the algorithm with a single thread and 
> > > > comparing it to
> > > > the current algorithm in gemato (which should still be discussed here?).
> > > 
> > > I remember something that gemato used to use multiple threads, but
> > > because it totally saturated disk-IO, it was brought back to a single
> > > thread.  People were complaining about unusable systems.
> > > 
> > 
> > I think this is an argument for cgroups limits support on the portage 
> > process or
> > account as opposed to an argument against picking a better algorithm. That 
> > is
> > something I have been working towards, but I am only one man.
> 
> But this requires a) cgroups support, and b) the privileges to use it.
> Shouldn't be a problem in the normal case, but just saying.
> 
> > > In any case, can you share your performance results?  What speedup did
> > > you see, on warm and hot FS caches?  Which type of disk do you use?
> > > 
> > 
> > I ran all tests multiple times to make them warm off of a Samsung SSD, but
> > nothing very precise yet.
> > 
> > % gemato verify --openpgp-key signkey.asc /var/db/repos/gentoo
> > [...]
> > INFO:root:Verifying /var/db/repos/gentoo...
> > INFO:root:/var/db/repos/gentoo verified in 16.45 seconds
> > 
> > sometimes going higher, closer to 18s, vs.
> > 
> > % ./veriftree.py
> > 4.763171965983929
> > 
> > So roughly an order of magnitude speedup without batching to threads.
> 
> That is kind of a change.  Makes one wonder if you really did the same
> work.
> 

That was my initial reaction. I attempted to ensure I was processing all of
the files that gemato processed. The full output of my script is something
closer to:

% ./veriftree.py
x.xxxxxxxxxx
192157
126237

The first number being the time, the second the total number of manifest 
directives, 
and the third being the number of real files in the tree. If you prune the 
directives
that correspond to no file you end up with an exact match IIRC.

However, you are right, and I think this is old code. gemato times the manifest 
file
parsing as well as the verification. It seems this change is not in the code I
provided. If I do that instead, I get:

% ./veriftree.py
11.708862617029808
192157
126237

With corresponding times for gemato (at same system state, etc) being ~20s. So 
it
is a halving at worst with assured n-core speedup for 1/2 of that time, and I am
fairly confident I can speed up the manifest parsing even more as well.

> > > You could compare against qmanifest, which uses OpenMP-based
> > > paralllelism while verifying the tree.  On SSDs this does help.
> > > 
> > 
> > I lost my notes -- how do I specify to either gemato or qmanifest the GnuPG
> > directory? My code is partially structured as it is because I had problems 
> > doing
> > this. I rediscovered -K/--openpgp-key in gemato but am unsure for qmanifest.
> 
> qmanifest doesn't do much magic out of the standard gnupg practices.
> (It is using gpgme.)  If you want it to use a different gnupg dir, you
> may change HOME, or GNUPGHOME.
> 

Alright, I will attempt to set that. I think I like the interface of gemato a 
little more
but will look at qmanifest and see how it performs.

Reply via email to