On Tue, Aug 25, 2009 at 7:07 PM, Steve Reinhardt <[email protected]> wrote:
> On Tue, Aug 25, 2009 at 6:56 PM, Gabriel Michael
> Black<[email protected]> wrote:
>>
>>> Probably the easiest way would just be to take a current decoder.cc,
>>> hack it up manually to match one of the thigns we're proposing, then
>>> invoke gcc manually on the result and time it.  (Not necessarily easy
>>> in an absolute sense, but it's just a one-off try so there's no point
>>> in doing anything more automated IMO.)
>>
>> So are you volunteering to split up the, acording to wc -l, ~110,000 line
>> file? :-) That'll be quite a task.
>
> I see no mention of specific individuals in my comment!  A lot depends
> on whether you can hack out a few contiguous 30,000-line chunks or if
> you'd have to do a lot of interleaving a few lines at a time to get it
> to work.  Even in the latter case, some emacs macro creativity could
> possibly go a long way.  I don't object to giving it a shot myself,
> but it won't be soon.

So I finally got around to this, and it wasn't hard at all.  Basically
the decoder.cc file has four parts, with the rough percentage by line
as indicated:

#includes (negligible)
microop definitions (12%)
macroop definitions (68%)
decode function (20%)

Since all of the declarations are in decoder.hh, you can compile each
of these separately as long as you replicate the #includes in each
file. Because the macroop functions are all independent, you can split
these any way you like too.  Here are the results I got from splitting
things various ways:

monolothic: 4:15
monolithic no inlines: 3:09
definitions | decode function: 2:31 + 0:30 = 3:01
microops | macroops | decode function: 0:16 + 2:20 + 0:30 = 3:06
microops | 1/2 macroops | 2/2 macroops | decode function: 0:16 + 1:24
+ 1:22 + 0:30 = 3:32
microops | {1,2,3,4}/4 macroops | decode function: 0:16 + 0:50 + 0:33
+ 0:24 + 0:53 + 0:30 = 3:26

So we can see there's some noticeable non-linearity in including the
decode function along with the instruction definitions, I assume
because in the monolithic case it's inlining all the macroop
constructors in the decode function which takes some time.  (I pretty
much verified that by getting rid of all the 'inline' declarations, as
I had to do anyway for the non-monolithic cases, and seeing that the
monolithic compile time went down to 3:18 as shown.)  However, once we
split that out, splitting the code further doesn't decrease the total
compilation time, and in fact it starts to go back up again at the
end.  But overall, the fact that we're sucking in decoder.hh (all 31K
lines) multiple times instead of once doesn't seem to be a big
problem, which is also a good thing.

As I noted before, even a linear decrease in compilation time is not
bad since it potentially speeds things up on a multicore.  However, it
looks like once we stop inlining in the decode function, that's all
we're going to get.

Also, since compiling the decode function by itself with no inlining
is pretty quick, there's no compilation-time motivation to split it up
as I had proposed (though it still may or may not turn out to be
useful from an ISA description modularity perspective).

I'll take a look at trying to do something semi-automatic to split
things up.  In addition to putting the decide function in a separate
file, maybe we could add a directive like

  set decoder_output "<filename>";

that would cause all other decoder output (that would normally go to
decoder.cc) to use this alternate filename until the next similar
directive (if any).  Then we could just sprinkle a few of those in the
isa definition at strategic points.

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to