[gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite

Duncan Wed, 08 Feb 2006 16:19:08 -0800

Simon Stelling posted <[EMAIL PROTECTED]>, excerpted below,  on
Wed, 08 Feb 2006 21:37:33 +0100:

> Duncan wrote:

>> I should really create a page listing all the little Gentoo admin scripts
>> I've come up with and how I use them.  I'm sure a few folks anyway would
>> likely find them useful.
>> 
>> The idea behind most of them is to create shortcuts to having to type in
>> long emerge lines, with all sorts of arbitrary command line parameters.
>> The majority of these fall into two categories, ea* and ep*, short for
>> emerge --ask <additional parameters> and emerge --pretend ... .  Thus, I
>> have epworld and eaworld, the pretend and ask versions of emerge -NuDv
>> world, epsys and easys, the same for system, eplog <package>, emerge
>> --pretend --log --verbose (package name to be added to the command line so
>> eplog gcc, for instance, to see the changes between my current and the new
>> version of gcc), eptree <package>, to use the tree output, etc.
> 
> Interesting. But why do you use scripts and not simple aliases? Every time you
> launch your script the HD performs a seek (which is very expensive in time),
> copies the script into memory and then forks a whole bash process to execute a
> one-liner. Using alias, which is a bash built-in, wouldn't fork a process and
> therefore be much faster.

My thinking, which is possibly incorrect (your input appreciated), is that
file-based scripts get pulled into cache the first time they are executed,
and will remain there (with a gig of memory) pretty much until I'm done
doing my upgrades.  At the same time, they are simply in cache, not
something in bash's memory, so if the memory is needed, it will be
reclaimed.  As well, after I'm done and on to other tasks, the cached
commands will eventually be replaced by other data, if need be.

Aliases (and bash-functions) are held in memory.  That's not as flexible
as cache in terms of being knocked out of memory if the memory is needed
by other things.  Sure, that memory may be flushed to disk-based swap, but
that's disk based the same as the actual script files  I'm using, so
reading it back into main memory if it's faulted out will take something
comparable to the time it'd take to read in the script file again anyway. 
That's little gain, with the additional overhead and therefore loss of
having to manage the temp-copy in swapped memory, if it comes to that.

Actually, there are some details here that may affect things.  I don't
know enough about the following factors to be able to evaluate how they
balance out, but the real reason I chose individual scripts is below.

One, here anyway, tho not on most systems, I'm running four SATA disks in
RAID.  The swap is actually not on the RAID, as the kernel manages it like
RAID on its own, provided all four swap areas are set to the same priority
(they are), which means swap is running on the equivalent of
four-way-striped RAID-0.  Meanwhile, the scripts, as part of my main
system, are on RAID-6 for redundancy, so with the same four disks backing
the RAID-6 as the swap, I've only effectively two-way-striped storage
there, the other two disk stripes being parity.  Thus, retrieval from the
4-way-striped swap should in theory be more efficient than from the
2-way-striped regular storage.  OTOH, the granularity of the stripe
in either case, against the size of the one or two-line script, likely
means that it'll be pulled from a single stripe (at the speed of
reading from a single disk, tho there are parallelizing opportunities
not available on a single disk).  It's also likely that the swap will be
more optimally managed for fast retrieval than the location on the regular
filesystem is.  Balanced against that we have the overhead of maintaining
the swap tracking.

That's assuming it would swap that out to the dedicated swap in the first
place.  I'm not familiar with Linux's VM, but given that the aliases and
functions would be file-based in either case, it's possible it would
simply drop the data from main memory, relying on the fact that that the
data is clean file-backed data and could be read-in directly from the
files again, if necessary, rather than bothering with actually creating a
temporary copy of the /same/ data in swap, taking time to do so when it
could just read it back in from the file.

Another aspect is the effect of data vs metadata caching.  Again, I'm not
familiar with how Linux manages this, and indeed, it may differ between
filesystems, but the idea is that if the file metadata is still cached,
even if the file itself isn't, it's a single disk seek and read to read
the data back in, as opposed to multiple seeks and reads, following the
logical directory structure to fetch each directory table in the
hierarchy until it reaches the entry that actually has the file location,
before it can read the file itself, to read the file initially, or if the
location metadata has been flushed as well.  (Back several years ago on
MSWormOS, one of the first things I always did after a reinstall was set
the system to server profile, which kept a far larger metadata cache, on
the theory that the metadata was usually smaller than the data, and for
dirs, sharable among many data files, so I'd rather spend cache memory on
metadata than data.  The other choices were the default desktop profile,
and laptop, a much smaller metadata cache.  I originally learned about
these as a result of reading about a bug in the original 95 as shipped,
that swapped some entries in the registry, and therefore cached FAR less
metadata than it should have. I don't know where these tweaks are located
on Linux, or how to go about adjusting them safely.)

Basically, therefore, I don't believe aliases to be a big positive, and
possibly somewhat of a negative, as opposed to scripts, because the
scripts will be cached in most cases after initial use anyway, yet they
have the advantage of not having to be maintained or tracked in memory
when I'm doing other tasks and the system needs that cache.

Given that I don't believe it's a big positive, I prefer the
administrative convenience and maintainability of separate scripts.

There /is/ a third alternative, that I came across recently, that I think
is a good idea.  If you'd coomment, perhaps it would help me sort out the
implications.

The idea, simply put, is "bash command theming", single scripts that can
be invoked that will "theme" a command prompt for the tasks at hand.  I
didn't read the entire article I saw covering this, but skimmed it enough
to get the gist.  A single invokable script for each set of tasks, say
perl programming, bash programming, working with portage, etc, that would
set up a specific set of aliases and functions for that task.  Invoking
the script with the "off" parameter would erase that set of aliases and
bash functions, thereby recovering the memory, and do any related cleanup
like resetting the path if necessary to exclude any task specific
commands.  Taking this a step further, a variable could be setup that
would list the theme or themes that were active, that the theme-setup
script could read and automatically deactivate the previous theme while
switching to the new one.  One could even share functionality between
themes, sourcing common files, which would check the active theme and
adjust their behavior based on the active theme.

This alias and function theming wouldn't be quite as modular (tho with
sourcing it could be) as the individual scripts, but would maintain the
performance advantages (if any) of the alias/function idea, while at the
same time allowing the memory reclamation of the cached-script option.  It
sounds really good, but I'm not yet convinced the benefits would be worth
the additional effort of setting up those themes, since the solution I
have works.

One VERY NICE benefit of the themes idea is that it would directly
address any namespace pollution concerns.  It has a direct appeal to
programmers and anyone else that's ever had to deal with such issues, for
that reason alone.  One single command on the path to invoke the theme,
possibly even an eselect-like command shared among themes, with
everything else off-path and out of the namespace unless that theme is
invoked!  /VERY/ appealing indeed.  OTOH, there are those who'll never
remember the theme they have active at the moment, and be constantly
confused.  For these folks, it'd be a nightmare!

> man emerge:
>        --oneshot (-1)
> 
> IIRC --oneshot has a short form since 2.0.52 was released.

Learn new things everyday.  Thanks!  I remember how pleased I was to have
--newuse, and even more so when I discovered -N, so very nice!

>> ...  Deep breath... <g>
>> 
>> All that as a preliminary explanation to this:  Along with the above, I
>> have a set of efetch functions, that invoke the -f form, so just do the
>> fetch, not the actual compile and merge, and esyn (there's already an
>> esync function in something or other I have merged so I just call it
>> esyn), which does emerge sync, then updates the esearch db, then
>> automatically fetches all the packages that an eaworld would want to
>> update, so they are ready for me to merge at my leisure.
> 
> I'm a bit confused now. You use *functions* to do that? Or do you mean
> scripts? By the way: with alias you could name your custom "script"
> esync because it doesn't place a file on the harddisk.

Scripts.  I was using "functions" in the generic sense here.  I did
realize before I sent that it had a dual meaning, but figured it wasn't
important enough a distinction to go back and correct, or explain. 
Unfortunately, every time I decide to skip something like that, I get
called on it, which doesn't help my posts get any shorter! =8^)

>> I choose -Os, optimize for size, because a modern CPU and the various
>> cache levels are FAR faster than main memory. 
> 
> Given the fact that two CPUs, only differing in L2 Cache size, have
> nearly the same performance, I doubt that the performance increase is
> very big. Some interesting figures:
> 
> Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB
> L2-cache is 4% faster than an Athlon64 of the same frequency but with only 
> 512kB
> L2-cache. The bigger the cache sizes you compare get, the smaller the
> performance increase. Since you run a dual Opteron system with 1 MB L2
> cache per CPU I tend to say that the actual performance increase you
> experience is about 3%. But then I didn't take into account that -Os
> leaves out a few optimizations which would be included by -O2, the
> default optimization level, which actually makes the code a bit slower
> when compared to -O2. So, the performance increase you really experience
> shrinks to about 0-2%. I'd tend to proclaim that -O2 is even faster for
> most of the code, but that's only my feeling.

Interesting, indeed.  I'd counter that it likely has to do with how many
tasks are being juggled as well, plus the number of kernel/user context
switches, of course.  I wonder under what load, and with what task-type,
the above 4% difference was measured.

Of course, the definitive way to end the argument would be to do some
profiling and get some hard numbers, but I don't think either you or I
consider it an important enough factor in our lives to go to /that/ sort
of trouble. <g>

> Beside that I should mention that -Os sometimes still has problems with
> huge packages like glibc.

Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I
did try it with all that stripflags logic disabled.  For glibc, it /does/
seem to slow things down, or did back with gcc-3.3 (IIRC) anyway.  I tried
the same glibc both ways.  I would have tried tinkering further, but
decided it wasn't worth complicating debugging and the like, since glibc
is loaded by virtually everything, and I'd never be able to tell if it was
my funny tweaks to glibc, or some actual issue with whatever package. 
Besides, that's an aweful costly package, in terms of recompile time, not
to mention system stability, to be experimenting with.  I /can/ say,
however, that it didn't crash or cause any other issues I could see or
attribute to it.

OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg
builds seemed to perform better with -Os.  I tried one of them (6.8??)
both ways too.  I ended up routinely killing the stripflags logic, but I
was modifying other portions of the ebuild as well (so it compiled only
the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi,
among other things), so that was just one of several modifications I was
making, tho the only real performance affecting one. Performance in X was
better, but it DID take longer to switch to a VT, when I tried that.  In
fact, at one point, the switch to VT functionality broke, but someone
mentioned it was broken in general at that point for certain drivers,
anyway, so I'm not sure my optimizations had anything to do with it.

>> Of course, this is theory, and the practical case can and will differ
>> depending on the instructions actually being compiled.  In particular,
>> streaming media apps and media encoding/decoding are likely to still
>> benefit from the traditional loop elimination style optimizations,
>> because they run thru so much data already, that cache is routinely
>> trashed anyway, regardless of the size of your instructions.  As well,
>> that type of application tends to have a LOT of looping instructions to
>> optimize!
>> 
>> By contrast, something like the kernel will benefit more than usual
>> from size optimization.  First, it's always memory locked and as such
>> can't be swapped, and even "slow" main memory is still **MANY**
>> **MANY** times faster than swap, so a smaller kernel means more other
>> stuff fits into main memory with it, and isn't swapped as much. Second,
>> parts of the
> 
> Funny to hear this from somebody with 4 GB RAM in his system. I don't
> know how bloated your kernel is, but even if -Os would reduce the size
> of my kernel to **the half**, which is totally impossible, it wouldn't
> be enough to load the mail I am just answering into RAM. So, basically,
> this reasoning is just ridiculous.

I won't argue with that.  BTW, still at a gig, much to my frustration!  I
put off upgrading memory when I decided my disk was in danger of going bad
and I ended up deciding to go 4-disk SATA based RAID.  Then I upgraded my
stereo near Christmas...  Now the CC is almost paid off again, so I'm
looking at that memory upgrade again.

Much to my frustration, memory prices don't seem to be dropping much
lately!

> You are referring a lot to the gcc manpage, but obviously you missed
> this part:
> 
>        -fomit-frame-pointer
>            Don't keep the frame pointer in a register for functions that
>            don't need one.  This avoids the instructions to save, set up
>            and restore frame pointers; it also makes an extra register
>            available in many functions.  It also makes debugging
>            impossible on some machines.
> 
>            On some machines, such as the VAX, this flag has no effect,
>            because the standard calling sequence automatically handles
>            the frame pointer and nothing is saved by pretending it
>            doesn't exist.  The machine-description macro
>            "FRAME_POINTER_REQUIRED" controls whether a target machine
>            supports this flag.
> 
>            Enabled at levels -O, -O2, -O3, -Os.
> 
> I have to say that I am a bit disappointed now. You seemed to be one of
> those people who actually inform themselves before sticking new flags
> into their CFLAGS.

??

I'm not sure which way you mean this.  It was in my CFLAGS list, but I
didn't discuss it as it's fairly common (from my observation, nearly as
common as -pipe) and seems fairly non-controversial on Gentoo.  Did you
miss it in my CFLAGS and are saying I should be using it, or did you see
it and are saying its unnecessary and redundant because it's enabled by
the -Os?

If the latter, yes, but as mentioned above in the context of glibc, -Os is
sometimes stripped.  In that case, the redundancy of having the basic
-fomit-frame-pointer is useful, unless it's also stripped, but as I said,
it seems much less controversial than some flags and is often
specifically allowed where most are stripped.

Or, are you saying I should avoid it due to the debugging implications?  I
don't quite get it.

>> !!! Relying on the shell to locate gcc, this may break !!! DISTCC,
>> installing gcc-config and setting your current gcc !!! profile will fix
>> this
>> 
>> Another warning, likewise to stderr and thus not in the eis output.
>> This one is due to the fact that eselect, the eventual systemwide
>> replacement for gcc-config and a number of other commands, uses a
>> different method to set the compiler than gcc-config did, and portage
>> hasn't been adjusted to full compatibility just yet.  Portage finds the
>> proper gcc just fine for itself, but there'd be problems if distcc was
>> involved, thus the warning.
> 
> Didn't know about this. Have you filed a bug yet on the topic? Or is
> there already one?

There is one.  I don't recall if I filed it or if it was already there,
but both JH and the portage folks know about the issue.  IIRC, the portage
folks decided it was their side that needed changed, but that required
changes to the distcc package, and I don't know how that has gone since I
don't use distcc, except that I was slightly surprised to see the warning
in portage 2.1 still.

>> MAKEOPTS="-j4"
>> 
>> The four jobs is nice for a dual-CPU system -- when it works.
>> Unfortunately, the unpack and configure steps are serialized, so the
>> jobs option does little good, there.  To make most efficient use of the
>> available cycles when I have a lot to merge, therefore, I'll run as
>> many as five merges in parallel.  I do this quite regularly with KDE
>> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and
>> have something north of 100 packages to merge before KDE is fully
>> upgraded.
> 
> I really wonder how you would paralellize unpacking and configuring a
> package.

That's what was nice about configcache, which was supposed to be in the
next portage, but I haven't seen or heard anything about it for awhile,
and the next portage, 2.1, is what I'm using.  configcache seriously
shortened that stage of the build, leaving more of it parallelized, but...

I was using it for awhile, patching successive versions of portage, but it
broke about the time sandbox split, the dev said he wasn't maintaining the
old version since it was going in the new portage, and I tried updating
the patch but eventually ran into what I think were unrelated issues but
decided to drop that in one of my troubleshooting steps and never picked
it up again.

I'd certainly like to have it back again, tho.  If it's working in 2.1,
I've not seen it documented or seen any hints in the emerge output, as
were there before.  You seen or heard anything?

BTW, what is your opinion on -ftracer?  Several devs I've noticed use it,
but the manpage says it's not that useful without active profiling, which
means compiling, profiling, and recompiling, AFAIK.  It's possible the
devs running it do that, but I doubt it, and otherwise, I don't see that
it should be that useful?  I don't know if you run it, but since I've got
your attention, I thought I'd ask what you think about it.  Is there
something of significance I'm missing, or are they, or are they actually
doing that compile/profile/recompile thing?  It just doesn't make sense to
me.  I've seen it in several user posted CFLAGS as well, but I'll bet a
good portion of them are simply because they saw it in a dev's CFLAGS and
decided it looked useful, not because they understand any implications
stated in the manpage.  (Not that I always do either, but... <g>)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

-- 
[email protected] mailing list

[gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite

Reply via email to