No rush... we've lived with this for quite a while, at least we know why now.
On Tue, Nov 18, 2008 at 3:10 PM, nathan binkert <[EMAIL PROTECTED]> wrote: > I added support for this kind of file mapping stuff for the m5 command > so I could load multiple files into the simulator in full system mode. > The python portion of this work could easily be ported to SE mode. > Unless you want diffs now, I'll work on getting this stuff in the tree > after ISCA. > > It basically allowed me to clean up all of the boot scripts and such. > > Nate > > On Tue, Nov 18, 2008 at 2:25 PM, Steve Reinhardt <[EMAIL PROTECTED]> wrote: >> Took me a lot longer than it should have in retrospect, but here's the >> problem (from --trace-flags=SyscallVerbose): >> >> 594199893000: global: opening file /proc/meminfo >> 594199893000: system.cpu: syscall open returns 4 >> 594200152000: system.cpu: syscall fstat called w/arguments >> 4,140737488339680,140737488339680,0 >> 594200152000: system.cpu: syscall fstat returns 0 >> [...] >> 594200272000: system.cpu: syscall read called w/arguments >> 4,46912559464448,8192,34 >> 594200272000: system.cpu: syscall read returns 630 >> >> I don't know *why* parser opens, fstats, and reads /proc/meminfo, but >> that's clearly where the system dependence is coming from. As far as >> fixing the problem, the easiest thing would be to hack parser to not >> do that, or just not use parser in the regressions. >> >> If we wanted to get really fancy we could recognize /proc/meminfo as >> special and redirect it to some canned input. It might be worth >> checking in open() and warning anytime anything under /proc gets >> opened. Or maybe we should implement something like chroot inside of >> SE mode, so you could get rid of all the path-based issues by forcing >> everything to be relative to the working dir, and then use symlinks to >> set up the structure you want... powerful, but overkill for our uses >> IMO. >> >> Steve >> >> On Mon, Nov 17, 2008 at 7:37 PM, <[EMAIL PROTECTED]> wrote: >>> Yes, I'm sure it's not a timing mode thing. The timing mode regressions >>> didn't >>> exist for x86 until very recently, and parser has been unstable for maybe as >>> long as a year. >>> >>> Gabe >>> >>> Quoting Steve Reinhardt <[EMAIL PROTECTED]>: >>> >>>> Interestingly, I just ran on my desktop here and on zizzer and both >>>> failed, but when I looked more closely, I see that my desktop is >>>> failing because it's running 5 fewer instructions than the reference >>>> output, while zizzer is failing because it's running 5 extra >>>> instructions. (And yes, I double-checked and they both have the same >>>> reference instruction count.) Both of these seem pretty consistent. >>>> >>>> I also checked the poolfs regression outputs and they get yet a third >>>> value, and amazingly the simple-atomic runs fail there too. All of >>>> the instruction counts vary only in the last couple of digits, so I'll >>>> just use those to summarize: >>>> >>>> ref zizzer poolfs home >>>> simple-atomic 702 702 786 692 >>>> simple-timing 697 702 786 692 >>>> >>>> So it doesn't appear to be a timing-mode thing; that's just a side >>>> effect of us having inconsistent reference outputs for the two runs. >>>> >>>> Steve >>>> >>>> On Mon, Nov 17, 2008 at 2:53 PM, <[EMAIL PROTECTED]> wrote: >>>> > Exactly. Or one machine will be in Ann Arbor and the other in California. >>>> Maybe >>>> > it has something to do with the test checking the actual clock time/date >>>> > on >>>> the >>>> > host somehow? It could behave slightly differently depending on some >>>> > little >>>> part >>>> > of that like converting it to seconds changing the path the microcode >>>> > takes >>>> for >>>> > the division instruction or something. >>>> > >>>> > Speaking of which, I think it would be really handy to distinguish >>>> > between >>>> the >>>> > number of actual instructions that commit vs. the number of microops. If >>>> > I >>>> have >>>> > to change microcode for some reason I'd expect the later to change, but >>>> > the >>>> > former probably means I broke something. >>>> > >>>> > Gabe >>>> > >>>> > Quoting nathan binkert <[EMAIL PROTECTED]>: >>>> > >>>> >> The biggest problem is that I've never been able to find two machines >>>> >> that behave differently. When things change, I can't find something >>>> >> that did it the "old" way. >>>> >> >>>> >> Nate >>>> >> >>>> >> >>>> >> > If somebody can and wants to get a tracediff between two differently >>>> >> behaving >>>> >> > versions of parser, that would go a long way to figuring out what the >>>> >> problem >>>> >> > is. >>>> >> > >>>> >> > Gabe >>>> >> > >>>> >> > Quoting nathan binkert <[EMAIL PROTECTED]>: >>>> >> > >>>> >> >> I more meant that it seems like an infrequently used syscall that >>>> >> >> uses >>>> >> >> an uninitilaized variable that affects the return value could easily >>>> >> >> be the result. The stats differences in both simulations are minimal >>>> >> >> and similar. >>>> >> >> >>>> >> >> Nate >>>> >> >> >>>> >> >> On Mon, Nov 17, 2008 at 12:07 PM, Steve Reinhardt <[EMAIL PROTECTED]> >>>> >> wrote: >>>> >> >> > I sort of doubt it... parser has always been a bit >>>> >> >> > nondeterministic, >>>> >> >> > where this is just a subtle and unforeseen but deterministic side >>>> >> >> > effect of a bug fix. >>>> >> >> > >>>> >> >> > Steve >>>> >> >> > >>>> >> >> > On Mon, Nov 17, 2008 at 11:57 AM, nathan binkert <[EMAIL >>>> >> >> > PROTECTED]> >>>> >> wrote: >>>> >> >> >> Ah, so that was you. That makes sense. I seriously wonder if >>>> >> >> >> this >>>> or >>>> >> >> >> something like it is the problem with 20.parser. >>>> >> >> >> >>>> >> >> >> Nate >>>> >> >> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
