Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite

Bernhard Auzinger Thu, 09 Feb 2006 11:07:44 -0800

May I put my oar into your optimisation dicussion. 

It's funny, Duncan. On the one side you are saving every byte of cpu-cache. On 
the other side, you are happy by having forked bashes in your main memory. 
But how do you take control about that? I mean, how do you get the code of 
your forked bashes away from your cpu cache to have it free for kernel code?


A long time ago . . ., I was testing some CFLAGS on my own programs. I wrote a 
fast-fourier algorithm myself, only to see the "impressive" difference 
between Os, O3 and some other optimisation flags. I fed my fast-fourier 
algorithm with a large amount of input. But no matter how hard I tried to get 
it faster by changing the flags, it didn't work. The difference is marginal 
and not every flag brings improvement for every program. The only thing that 
changed a lot was the time gcc needs to perform those optimisations.

Bernhard
Am Donnerstag 09 Februar 2006 01:17 schrieb Duncan:
> Simon Stelling posted <[EMAIL PROTECTED]>, excerpted below,  on
>
> Wed, 08 Feb 2006 21:37:33 +0100:
> > Duncan wrote:
> >> I should really create a page listing all the little Gentoo admin
> >> scripts I've come up with and how I use them.  I'm sure a few folks
> >> anyway would likely find them useful.
> >>
> >> The idea behind most of them is to create shortcuts to having to type in
> >> long emerge lines, with all sorts of arbitrary command line parameters.
> >> The majority of these fall into two categories, ea* and ep*, short for
> >> emerge --ask <additional parameters> and emerge --pretend ... .  Thus, I
> >> have epworld and eaworld, the pretend and ask versions of emerge -NuDv
> >> world, epsys and easys, the same for system, eplog <package>, emerge
> >> --pretend --log --verbose (package name to be added to the command line
> >> so eplog gcc, for instance, to see the changes between my current and
> >> the new version of gcc), eptree <package>, to use the tree output, etc.
> >
> > Interesting. But why do you use scripts and not simple aliases? Every
> > time you launch your script the HD performs a seek (which is very
> > expensive in time), copies the script into memory and then forks a whole
> > bash process to execute a one-liner. Using alias, which is a bash
> > built-in, wouldn't fork a process and therefore be much faster.
>
> My thinking, which is possibly incorrect (your input appreciated), is that
> file-based scripts get pulled into cache the first time they are executed,
> and will remain there (with a gig of memory) pretty much until I'm done
> doing my upgrades.  At the same time, they are simply in cache, not
> something in bash's memory, so if the memory is needed, it will be
> reclaimed.  As well, after I'm done and on to other tasks, the cached
> commands will eventually be replaced by other data, if need be.
>
> Aliases (and bash-functions) are held in memory.  That's not as flexible
> as cache in terms of being knocked out of memory if the memory is needed
> by other things.  Sure, that memory may be flushed to disk-based swap, but
> that's disk based the same as the actual script files  I'm using, so
> reading it back into main memory if it's faulted out will take something
> comparable to the time it'd take to read in the script file again anyway.
> That's little gain, with the additional overhead and therefore loss of
> having to manage the temp-copy in swapped memory, if it comes to that.
>
> Actually, there are some details here that may affect things.  I don't
> know enough about the following factors to be able to evaluate how they
> balance out, but the real reason I chose individual scripts is below.
>
> One, here anyway, tho not on most systems, I'm running four SATA disks in
> RAID.  The swap is actually not on the RAID, as the kernel manages it like
> RAID on its own, provided all four swap areas are set to the same priority
> (they are), which means swap is running on the equivalent of
> four-way-striped RAID-0.  Meanwhile, the scripts, as part of my main
> system, are on RAID-6 for redundancy, so with the same four disks backing
> the RAID-6 as the swap, I've only effectively two-way-striped storage
> there, the other two disk stripes being parity.  Thus, retrieval from the
> 4-way-striped swap should in theory be more efficient than from the
> 2-way-striped regular storage.  OTOH, the granularity of the stripe
> in either case, against the size of the one or two-line script, likely
> means that it'll be pulled from a single stripe (at the speed of
> reading from a single disk, tho there are parallelizing opportunities
> not available on a single disk).  It's also likely that the swap will be
> more optimally managed for fast retrieval than the location on the regular
> filesystem is.  Balanced against that we have the overhead of maintaining
> the swap tracking.
>
> That's assuming it would swap that out to the dedicated swap in the first
> place.  I'm not familiar with Linux's VM, but given that the aliases and
> functions would be file-based in either case, it's possible it would
> simply drop the data from main memory, relying on the fact that that the
> data is clean file-backed data and could be read-in directly from the
> files again, if necessary, rather than bothering with actually creating a
> temporary copy of the /same/ data in swap, taking time to do so when it
> could just read it back in from the file.
>
> Another aspect is the effect of data vs metadata caching.  Again, I'm not
> familiar with how Linux manages this, and indeed, it may differ between
> filesystems, but the idea is that if the file metadata is still cached,
> even if the file itself isn't, it's a single disk seek and read to read
> the data back in, as opposed to multiple seeks and reads, following the
> logical directory structure to fetch each directory table in the
> hierarchy until it reaches the entry that actually has the file location,
> before it can read the file itself, to read the file initially, or if the
> location metadata has been flushed as well.  (Back several years ago on
> MSWormOS, one of the first things I always did after a reinstall was set
> the system to server profile, which kept a far larger metadata cache, on
> the theory that the metadata was usually smaller than the data, and for
> dirs, sharable among many data files, so I'd rather spend cache memory on
> metadata than data.  The other choices were the default desktop profile,
> and laptop, a much smaller metadata cache.  I originally learned about
> these as a result of reading about a bug in the original 95 as shipped,
> that swapped some entries in the registry, and therefore cached FAR less
> metadata than it should have. I don't know where these tweaks are located
> on Linux, or how to go about adjusting them safely.)
>
> Basically, therefore, I don't believe aliases to be a big positive, and
> possibly somewhat of a negative, as opposed to scripts, because the
> scripts will be cached in most cases after initial use anyway, yet they
> have the advantage of not having to be maintained or tracked in memory
> when I'm doing other tasks and the system needs that cache.
>
> Given that I don't believe it's a big positive, I prefer the
> administrative convenience and maintainability of separate scripts.
>
> There /is/ a third alternative, that I came across recently, that I think
> is a good idea.  If you'd coomment, perhaps it would help me sort out the
> implications.
>
> The idea, simply put, is "bash command theming", single scripts that can
> be invoked that will "theme" a command prompt for the tasks at hand.  I
> didn't read the entire article I saw covering this, but skimmed it enough
> to get the gist.  A single invokable script for each set of tasks, say
> perl programming, bash programming, working with portage, etc, that would
> set up a specific set of aliases and functions for that task.  Invoking
> the script with the "off" parameter would erase that set of aliases and
> bash functions, thereby recovering the memory, and do any related cleanup
> like resetting the path if necessary to exclude any task specific
> commands.  Taking this a step further, a variable could be setup that
> would list the theme or themes that were active, that the theme-setup
> script could read and automatically deactivate the previous theme while
> switching to the new one.  One could even share functionality between
> themes, sourcing common files, which would check the active theme and
> adjust their behavior based on the active theme.
>
> This alias and function theming wouldn't be quite as modular (tho with
> sourcing it could be) as the individual scripts, but would maintain the
> performance advantages (if any) of the alias/function idea, while at the
> same time allowing the memory reclamation of the cached-script option.  It
> sounds really good, but I'm not yet convinced the benefits would be worth
> the additional effort of setting up those themes, since the solution I
> have works.
>
> One VERY NICE benefit of the themes idea is that it would directly
> address any namespace pollution concerns.  It has a direct appeal to
> programmers and anyone else that's ever had to deal with such issues, for
> that reason alone.  One single command on the path to invoke the theme,
> possibly even an eselect-like command shared among themes, with
> everything else off-path and out of the namespace unless that theme is
> invoked!  /VERY/ appealing indeed.  OTOH, there are those who'll never
> remember the theme they have active at the moment, and be constantly
> confused.  For these folks, it'd be a nightmare!
>
> > man emerge:
> >        --oneshot (-1)
> >
> > IIRC --oneshot has a short form since 2.0.52 was released.
>
> Learn new things everyday.  Thanks!  I remember how pleased I was to have
> --newuse, and even more so when I discovered -N, so very nice!
>
> >> ...  Deep breath... <g>
> >>
> >> All that as a preliminary explanation to this:  Along with the above, I
> >> have a set of efetch functions, that invoke the -f form, so just do the
> >> fetch, not the actual compile and merge, and esyn (there's already an
> >> esync function in something or other I have merged so I just call it
> >> esyn), which does emerge sync, then updates the esearch db, then
> >> automatically fetches all the packages that an eaworld would want to
> >> update, so they are ready for me to merge at my leisure.
> >
> > I'm a bit confused now. You use *functions* to do that? Or do you mean
> > scripts? By the way: with alias you could name your custom "script"
> > esync because it doesn't place a file on the harddisk.
>
> Scripts.  I was using "functions" in the generic sense here.  I did
> realize before I sent that it had a dual meaning, but figured it wasn't
> important enough a distinction to go back and correct, or explain.
> Unfortunately, every time I decide to skip something like that, I get
> called on it, which doesn't help my posts get any shorter! =8^)
>
> >> I choose -Os, optimize for size, because a modern CPU and the various
> >> cache levels are FAR faster than main memory.
> >
> > Given the fact that two CPUs, only differing in L2 Cache size, have
> > nearly the same performance, I doubt that the performance increase is
> > very big. Some interesting figures:
> >
> > Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB
> > L2-cache is 4% faster than an Athlon64 of the same frequency but with
> > only 512kB L2-cache. The bigger the cache sizes you compare get, the
> > smaller the performance increase. Since you run a dual Opteron system
> > with 1 MB L2 cache per CPU I tend to say that the actual performance
> > increase you experience is about 3%. But then I didn't take into account
> > that -Os leaves out a few optimizations which would be included by -O2,
> > the default optimization level, which actually makes the code a bit
> > slower when compared to -O2. So, the performance increase you really
> > experience shrinks to about 0-2%. I'd tend to proclaim that -O2 is even
> > faster for most of the code, but that's only my feeling.
>
> Interesting, indeed.  I'd counter that it likely has to do with how many
> tasks are being juggled as well, plus the number of kernel/user context
> switches, of course.  I wonder under what load, and with what task-type,
> the above 4% difference was measured.
>
> Of course, the definitive way to end the argument would be to do some
> profiling and get some hard numbers, but I don't think either you or I
> consider it an important enough factor in our lives to go to /that/ sort
> of trouble. <g>
>
> > Beside that I should mention that -Os sometimes still has problems with
> > huge packages like glibc.
>
> Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I
> did try it with all that stripflags logic disabled.  For glibc, it /does/
> seem to slow things down, or did back with gcc-3.3 (IIRC) anyway.  I tried
> the same glibc both ways.  I would have tried tinkering further, but
> decided it wasn't worth complicating debugging and the like, since glibc
> is loaded by virtually everything, and I'd never be able to tell if it was
> my funny tweaks to glibc, or some actual issue with whatever package.
> Besides, that's an aweful costly package, in terms of recompile time, not
> to mention system stability, to be experimenting with.  I /can/ say,
> however, that it didn't crash or cause any other issues I could see or
> attribute to it.
>
> OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg
> builds seemed to perform better with -Os.  I tried one of them (6.8??)
> both ways too.  I ended up routinely killing the stripflags logic, but I
> was modifying other portions of the ebuild as well (so it compiled only
> the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi,
> among other things), so that was just one of several modifications I was
> making, tho the only real performance affecting one. Performance in X was
> better, but it DID take longer to switch to a VT, when I tried that.  In
> fact, at one point, the switch to VT functionality broke, but someone
> mentioned it was broken in general at that point for certain drivers,
> anyway, so I'm not sure my optimizations had anything to do with it.
>
> >> Of course, this is theory, and the practical case can and will differ
> >> depending on the instructions actually being compiled.  In particular,
> >> streaming media apps and media encoding/decoding are likely to still
> >> benefit from the traditional loop elimination style optimizations,
> >> because they run thru so much data already, that cache is routinely
> >> trashed anyway, regardless of the size of your instructions.  As well,
> >> that type of application tends to have a LOT of looping instructions to
> >> optimize!
> >>
> >> By contrast, something like the kernel will benefit more than usual
> >> from size optimization.  First, it's always memory locked and as such
> >> can't be swapped, and even "slow" main memory is still **MANY**
> >> **MANY** times faster than swap, so a smaller kernel means more other
> >> stuff fits into main memory with it, and isn't swapped as much. Second,
> >> parts of the
> >
> > Funny to hear this from somebody with 4 GB RAM in his system. I don't
> > know how bloated your kernel is, but even if -Os would reduce the size
> > of my kernel to **the half**, which is totally impossible, it wouldn't
> > be enough to load the mail I am just answering into RAM. So, basically,
> > this reasoning is just ridiculous.
>
> I won't argue with that.  BTW, still at a gig, much to my frustration!  I
> put off upgrading memory when I decided my disk was in danger of going bad
> and I ended up deciding to go 4-disk SATA based RAID.  Then I upgraded my
> stereo near Christmas...  Now the CC is almost paid off again, so I'm
> looking at that memory upgrade again.
>
> Much to my frustration, memory prices don't seem to be dropping much
> lately!
>
> > You are referring a lot to the gcc manpage, but obviously you missed
> > this part:
> >
> >        -fomit-frame-pointer
> >            Don't keep the frame pointer in a register for functions that
> >            don't need one.  This avoids the instructions to save, set up
> >            and restore frame pointers; it also makes an extra register
> >            available in many functions.  It also makes debugging
> >            impossible on some machines.
> >
> >            On some machines, such as the VAX, this flag has no effect,
> >            because the standard calling sequence automatically handles
> >            the frame pointer and nothing is saved by pretending it
> >            doesn't exist.  The machine-description macro
> >            "FRAME_POINTER_REQUIRED" controls whether a target machine
> >            supports this flag.
> >
> >            Enabled at levels -O, -O2, -O3, -Os.
> >
> > I have to say that I am a bit disappointed now. You seemed to be one of
> > those people who actually inform themselves before sticking new flags
> > into their CFLAGS.
>
> ??
>
> I'm not sure which way you mean this.  It was in my CFLAGS list, but I
> didn't discuss it as it's fairly common (from my observation, nearly as
> common as -pipe) and seems fairly non-controversial on Gentoo.  Did you
> miss it in my CFLAGS and are saying I should be using it, or did you see
> it and are saying its unnecessary and redundant because it's enabled by
> the -Os?
>
> If the latter, yes, but as mentioned above in the context of glibc, -Os is
> sometimes stripped.  In that case, the redundancy of having the basic
> -fomit-frame-pointer is useful, unless it's also stripped, but as I said,
> it seems much less controversial than some flags and is often
> specifically allowed where most are stripped.
>
> Or, are you saying I should avoid it due to the debugging implications?  I
> don't quite get it.
>
> >> !!! Relying on the shell to locate gcc, this may break !!! DISTCC,
> >> installing gcc-config and setting your current gcc !!! profile will fix
> >> this
> >>
> >> Another warning, likewise to stderr and thus not in the eis output.
> >> This one is due to the fact that eselect, the eventual systemwide
> >> replacement for gcc-config and a number of other commands, uses a
> >> different method to set the compiler than gcc-config did, and portage
> >> hasn't been adjusted to full compatibility just yet.  Portage finds the
> >> proper gcc just fine for itself, but there'd be problems if distcc was
> >> involved, thus the warning.
> >
> > Didn't know about this. Have you filed a bug yet on the topic? Or is
> > there already one?
>
> There is one.  I don't recall if I filed it or if it was already there,
> but both JH and the portage folks know about the issue.  IIRC, the portage
> folks decided it was their side that needed changed, but that required
> changes to the distcc package, and I don't know how that has gone since I
> don't use distcc, except that I was slightly surprised to see the warning
> in portage 2.1 still.
>
> >> MAKEOPTS="-j4"
> >>
> >> The four jobs is nice for a dual-CPU system -- when it works.
> >> Unfortunately, the unpack and configure steps are serialized, so the
> >> jobs option does little good, there.  To make most efficient use of the
> >> available cycles when I have a lot to merge, therefore, I'll run as
> >> many as five merges in parallel.  I do this quite regularly with KDE
> >> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and
> >> have something north of 100 packages to merge before KDE is fully
> >> upgraded.
> >
> > I really wonder how you would paralellize unpacking and configuring a
> > package.
>
> That's what was nice about configcache, which was supposed to be in the
> next portage, but I haven't seen or heard anything about it for awhile,
> and the next portage, 2.1, is what I'm using.  configcache seriously
> shortened that stage of the build, leaving more of it parallelized, but...
>
> I was using it for awhile, patching successive versions of portage, but it
> broke about the time sandbox split, the dev said he wasn't maintaining the
> old version since it was going in the new portage, and I tried updating
> the patch but eventually ran into what I think were unrelated issues but
> decided to drop that in one of my troubleshooting steps and never picked
> it up again.
>
> I'd certainly like to have it back again, tho.  If it's working in 2.1,
> I've not seen it documented or seen any hints in the emerge output, as
> were there before.  You seen or heard anything?
>
> BTW, what is your opinion on -ftracer?  Several devs I've noticed use it,
> but the manpage says it's not that useful without active profiling, which
> means compiling, profiling, and recompiling, AFAIK.  It's possible the
> devs running it do that, but I doubt it, and otherwise, I don't see that
> it should be that useful?  I don't know if you run it, but since I've got
> your attention, I thought I'd ask what you think about it.  Is there
> something of significance I'm missing, or are they, or are they actually
> doing that compile/profile/recompile thing?  It just doesn't make sense to
> me.  I've seen it in several user posted CFLAGS as well, but I'll bet a
> good portion of them are simply because they saw it in a dev's CFLAGS and
> decided it looked useful, not because they understand any implications
> stated in the manpage.  (Not that I always do either, but... <g>)
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman in
> http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
-- 
[email protected] mailing list

Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite

Reply via email to