On Wed, 2007-05-16 at 09:50 -0700, Andrew Morton wrote:
> On Thu, 17 May 2007 00:34:33 +0800 David Woodhouse <[EMAIL PROTECTED]> wrote:
> 
> > A while ago, I played with using '-fwhole-program --combine' for
> > building kernel objects -- http://lwn.net/Articles/197097/
> > 
> > A found a few compiler bugs which I think should mostly be fixed now, so
> > I'm revisiting the kernel bits. The original patches I had looked
> > something like this:
> > 
> > http://david.woodhou.se/combine/combine-diff-1-fixes.patch
> > http://david.woodhou.se/combine/combine-diff-2-core.patch
> > http://david.woodhou.se/combine/combine-diff-3-global.patch
> > http://david.woodhou.se/combine/combine-diff-4-hacks.patch
> > 
> > Essentially, it added a CONFIG_COMBINED_COMPILE option which would build
> > every multi-part object (including built-in.o) with the -fwhole-program
> > and --combine flags. We saw savings of up to 14% in size in a few
> > places, and a more realistic 5-6% in many more.
> 
> So..  is 5% a reasonable estimate of the overall gain which we're likely to
> see here?

That kind of ball-park, I imagine; maybe a little less. I'll have to try
it again and see precisely what the numbers were overall. My old numbers
are at http://david.woodhou.se/combine/sizes-sorted.csv -- tuples of
(old,new,Δ,%,object).

The reason I say 'maybe a little less' is because I concentrated on
modules at the time because the Kbuild stuff was relatively easy for
those -- and I have a vague recollection that the built-in stuff didn't
show as big a difference. Searching for 'built-in.o' in the above-linked
CSV file seems to confirm that recollection.

When I do turn my attention to the kernel proper, once thing I want to
try is combining more than one directory at a time. I wouldn't do many,
but I suspect it would make sense to combine mm/ and arch/$ARCH/mm -- to
build all files in both of those at the same time. Likewise for kernel/
and arch/$ARCH/kernel/ -- or maybe even all four together. Or maybe
copmbine fs/ and mm/. I'll do some investigation as to precisely where
the best benefits are to be found.

There's a trade-off, obviously. We can't just build _every_ C file all
at the same time because the amount of memory used for that would be
insane. So we pick the bits which actually make _sense_ to show to the
optimiser at the same time, while keeping it broken down into manageable
chunks.

> And do we know specifically where that gain is coming from?  How the
> compiler/linker is achieving this?  If it's because we're all slackers,
> perhaps similar gains could come from manual fixes.

It's not always as simple as that. Sometimes it's cleaner to split stuff
up according to some _other_ criterion than what functions get exposed
to the optimiser together.

File systems are an obvious example of this -- perhaps they'd be
_optimal_ if we stuck them all into a single file, but I don't think
anyone would seriously advocate that.

The split between mm and arch/$ARCH/mm is another example.

> Would it be true to say that most such symbols are already marked with
> EXPORT_SYMBOL()?  If so, then I'd have thought that EXPORT_SYMBOL_INTERNAL
> would be a better approach, as less additional markups would be needed.

Er, it's exactly the same number of additional markups either way.

Yes, although I didn't actually count, I do think that 'most' are
already marked with EXPORT_SYMBOL, which is why I've already hacked
EXPORT_SYMBOL() to also mark the symbol in question with the appropriate
attributes -- so we don't need to add __global to anything which is
marked with EXPORT_SYMBOL().

It's only those symbols which are global within vmlinux but _not_
exported to modules which would need an additional marking. That marking
could be either EXPORT_SYMBOL_INTERNAL() or __global -- I'm mostly
ambivalent. I think I chose __global last time just because it was
easier to do in a semi-automatic fashion.

> I'd be concerned about ongoing maintainability.  If someone makes a change
> which breaks CONFIG_COMBINED_COMPILE then how would we be notified about
> it?

The most likely failure mode -- other than the WeirdShit™ which
accompanies a compiler bug -- is an unresolved symbol in the final link.
That isn't particularly hard to deal with, and in fact is precisely how
most of my 'combine-diff-3-global.patch' came about. You find the symbol
it's bitching about, and you mark it __global. I'm sure we have an
_army_ of people capable of doing test builds and catching missing tags
like that.

> And I assume that notification would only be visible to
> CONFIG_COMBINED_COMPILE users, so...  will this feature be available on
> x86_64 and i386, and what toolchain version is required?

Er, there may be tricks we can play to make notification available to
others. Maybe not, but I'll take a look.

Regarding the toolchain version -- last time I looked, at this, about a
year ago, I filed a stream of GCC bugs (again, listed under the lwn link
above¹). I will test gcc 4.2 and I expect it should be OK. I believe
Fedora's toolchain should also cope -- I'll make sure it does, because
we want this for OLPC.

Obviously I posted these patches just as an FYI last time, since I was
the only person in the world with a compiler that could cope. The reason
I'm revisiting it now is because it should actually be of use to someone
else too, this time :)

I'd normally do ppc32, ppc64 and i386 for myself -- I can't test booting
x86_64 but I can at least make sure it compiles. Would you believe
x86_64 is one of the few Linux architectures which I _don't_ have a
sample of?

-- 
dwmw2

¹ although the list there has a typo -- s/27889/27899/

-
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to