On Thursday 28 May 2009 07:46:36 Tiziano Müller wrote:

> And here is why (I'm only looking at the non-degenerated case with valid
> metadata, ignoring overlays which some consider a corner case (I don't
> understand that argument, but that's another thing)):

overlays tend to come without metadata. Just enabling the KDE overlay changed 
the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120 
seconds. Running emerge --metadata gets the performance back to pretty much 
the old levels.

> When the package manager looks at a package, it first reads the
> package's ebuild directory and gets the mtimes. It does the same for the
> cache entries and validates the caches (there is more stuff in here,
> like checking eclasses and so on).
Eclasses are negligible because you only have to look at them once for the 
whole caclulation. You can cache the mtime for the duration of your operation.

> Then the following happens based on the "solution" we choose:
> eapi-in-filename: the package manager starts from the highest version
> with a supported eapi (the others are inexistant with the used glob).
> For that ebuild it reads the cache entry and decides whether or not it
> can be used. 
In this case you amusingly do NOT want to cache the eapi in the cache, so you 
can even defer sourcing the ebuild until you actually need the metadata.
(You don't want to cache it because you need to check the file mtime anyway, 
and then you read the filename anyway. No need to look for it in another place 
then :) )

> If not, it proceeds to the next version, if yes, it's done.
> eapi-in-ebuild: the package manager reads all cache entries and sorts
> out those with an EAPI it doesn't support. The rest gets ordered and the
> same procedure as above applies.
>
> So, one of the main differences is: "reading one cache file" (if running
> unstable you can asssume you support the highest version, thus reading
> only one cache file) vs. "reading all cache files".
That assumes a dumb cache format. 
Why don't we make the cache more efficient so you read one file per package / 
category / ... ?

>
> I did some performance measurements based on that. I have 1507 installed
> packages with 5541 different versions/revisions.
>
> Reading from hot cache:
> 1507 files: ~50ms
> 5541 files: ~170ms
>
> Reading from cold cache:
> 1507 files: ~2.8s
> 5541 files: ~6s
And now you need to pull metadata for dependency calculation. How big is the 
impact of that?

>
> I made a lot of assumptions here (neglecting seek between ebuild-dir and
> metadata-dir, other processes using the drive, 80 ebuilds from overlays
> where the ebuild would have to be read, etc.). But estimating from the
> numbers above I'd say that a "emerge -uD world"/"paludis -i world" will
> be at least twice as slow, which I think is not acceptable.
I find that quite acceptable. As long as we're using such a bad layout the 
performance is secondary.

To fix the performance you'd "only" have to guarantee that the repo is 
unchanged (readonly), so you can add lots of simple caches/indexes - no need 
to source any ebuild for metadata again, one cachefile for eapi if you want 
... I bet you find lots of small improvements that that would yield. Much more 
impressive than managing to avoid a few open() here and there ...


> And I also don't understand your point of stating it's "bad design".
Bad design is like smelly feet. It's hard not to notice ...

> I mean: when coding you should "not optimize prematurely", but with
> eapi-in-ebuild it is against the other principle of "not pessimize
> prematurely" (Sutter/Alexandrescu: C++ Coding Standards).
If you quote that try the full quote:

"We should forget about small efficiencies, say about 97% of the time: 
premature optimization is the root of all evil."

In other words, we should not try to make that path faster when we can avoid 
hitting it at all with a small design revision.

Reply via email to