Steve Reinhardt wrote: > On Tue, Aug 25, 2009 at 7:07 PM, Steve Reinhardt <[email protected]> wrote: > >> On Tue, Aug 25, 2009 at 6:56 PM, Gabriel Michael >> Black<[email protected]> wrote: >> >>>> Probably the easiest way would just be to take a current decoder.cc, >>>> hack it up manually to match one of the thigns we're proposing, then >>>> invoke gcc manually on the result and time it. (Not necessarily easy >>>> in an absolute sense, but it's just a one-off try so there's no point >>>> in doing anything more automated IMO.) >>>> >>> So are you volunteering to split up the, acording to wc -l, ~110,000 line >>> file? :-) That'll be quite a task. >>> >> I see no mention of specific individuals in my comment! A lot depends >> on whether you can hack out a few contiguous 30,000-line chunks or if >> you'd have to do a lot of interleaving a few lines at a time to get it >> to work. Even in the latter case, some emacs macro creativity could >> possibly go a long way. I don't object to giving it a shot myself, >> but it won't be soon. >> > > So I finally got around to this, and it wasn't hard at all. Basically > the decoder.cc file has four parts, with the rough percentage by line > as indicated: > > #includes (negligible) > microop definitions (12%) > macroop definitions (68%) > decode function (20%) > > Since all of the declarations are in decoder.hh, you can compile each > of these separately as long as you replicate the #includes in each > file. Because the macroop functions are all independent, you can split > these any way you like too. Here are the results I got from splitting > things various ways: > > monolothic: 4:15 > monolithic no inlines: 3:09 > definitions | decode function: 2:31 + 0:30 = 3:01 > microops | macroops | decode function: 0:16 + 2:20 + 0:30 = 3:06 > microops | 1/2 macroops | 2/2 macroops | decode function: 0:16 + 1:24 > + 1:22 + 0:30 = 3:32 > microops | {1,2,3,4}/4 macroops | decode function: 0:16 + 0:50 + 0:33 > + 0:24 + 0:53 + 0:30 = 3:26 > > So we can see there's some noticeable non-linearity in including the > decode function along with the instruction definitions, I assume > because in the monolithic case it's inlining all the macroop > constructors in the decode function which takes some time. (I pretty > much verified that by getting rid of all the 'inline' declarations, as > I had to do anyway for the non-monolithic cases, and seeing that the > monolithic compile time went down to 3:18 as shown.) However, once we > split that out, splitting the code further doesn't decrease the total > compilation time, and in fact it starts to go back up again at the > end. But overall, the fact that we're sucking in decoder.hh (all 31K > lines) multiple times instead of once doesn't seem to be a big > problem, which is also a good thing. >
This sounds sort of like what I was expecting where code massaging was to blame and header files were negligible. Unfortunately it also sounds like there isn't any obvious way to decrease the load on the compiler end to end beyond not inlining and slowing the code down. > As I noted before, even a linear decrease in compilation time is not > bad since it potentially speeds things up on a multicore. However, it > looks like once we stop inlining in the decode function, that's all > we're going to get. > > Also, since compiling the decode function by itself with no inlining > is pretty quick, there's no compilation-time motivation to split it up > as I had proposed (though it still may or may not turn out to be > useful from an ISA description modularity perspective). > > I'll take a look at trying to do something semi-automatic to split > things up. In addition to putting the decide function in a separate > file, maybe we could add a directive like > > set decoder_output "<filename>"; > > that would cause all other decoder output (that would normally go to > decoder.cc) to use this alternate filename until the next similar > directive (if any). Then we could just sprinkle a few of those in the > isa definition at strategic points. > Yeah, being able to parallelize the build and being able to rebuild smaller parts of the decoder would make my life easier. The directive you proposed is along the lines of one of the possible solutions I'd been kicking around. I was worried that it might cause confusion where output ended up in a .cc you didn't expect because of a set decoder_output in some other .isa file given the fact that separate .isa files weren't designed in in all cases, but that might not be a big deal. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
