I did some digging and I figured it out. There is a libc function (posix.1): int sysconf(int name).
The various system parameters you can query vary from the maximum number of arguments to the number of free physical pages in the system. Parser doesn't call this function directly, however it does call qsort which checks to how much physical memory is available for use by calling sysconf(). The amount of physical memory is used to decide what algorithm is used and if enough memory can be allocated to have an additional copy of the items in memory. The glibc function falls back to returning -1 if it can't open /proc/meminfo and qsort handles this by assuming the correct amount of memory is available. So there are some options: 1) Just return -1 and warn any time an application attempts to read / proc/*; glibc will fall back or return an error however it can produce a non-optimal situation. 2) "create" a /proc/meminfo based on the number of free/allocated pages so that glibc makes the correct decision. I don't think anything else is really reasonable as it makes an non- optimal decision. As an aside, it seems that obreak() should error appropriately when the amount of memory requested exceeds the physical memory provided rather than fatal(). This would simply involve returning the old brk. As a second aside, it's possible that the person who is allocating 16GB of ram and still running into problems running some number of benchmarks could be running into an issue something like this. Ali On Nov 18, 2008, at 6:45 PM, Steve Reinhardt wrote: > No rush... we've lived with this for quite a while, at least we know > why now. > > On Tue, Nov 18, 2008 at 3:10 PM, nathan binkert <[EMAIL PROTECTED]> > wrote: >> I added support for this kind of file mapping stuff for the m5 >> command >> so I could load multiple files into the simulator in full system >> mode. >> The python portion of this work could easily be ported to SE mode. >> Unless you want diffs now, I'll work on getting this stuff in the >> tree >> after ISCA. >> >> It basically allowed me to clean up all of the boot scripts and such. >> >> Nate >> >> On Tue, Nov 18, 2008 at 2:25 PM, Steve Reinhardt <[EMAIL PROTECTED]> >> wrote: >>> Took me a lot longer than it should have in retrospect, but here's >>> the >>> problem (from --trace-flags=SyscallVerbose): >>> >>> 594199893000: global: opening file /proc/meminfo >>> 594199893000: system.cpu: syscall open returns 4 >>> 594200152000: system.cpu: syscall fstat called w/arguments >>> 4,140737488339680,140737488339680,0 >>> 594200152000: system.cpu: syscall fstat returns 0 >>> [...] >>> 594200272000: system.cpu: syscall read called w/arguments >>> 4,46912559464448,8192,34 >>> 594200272000: system.cpu: syscall read returns 630 >>> >>> I don't know *why* parser opens, fstats, and reads /proc/meminfo, >>> but >>> that's clearly where the system dependence is coming from. As far >>> as >>> fixing the problem, the easiest thing would be to hack parser to not >>> do that, or just not use parser in the regressions. >>> >>> If we wanted to get really fancy we could recognize /proc/meminfo as >>> special and redirect it to some canned input. It might be worth >>> checking in open() and warning anytime anything under /proc gets >>> opened. Or maybe we should implement something like chroot inside >>> of >>> SE mode, so you could get rid of all the path-based issues by >>> forcing >>> everything to be relative to the working dir, and then use >>> symlinks to >>> set up the structure you want... powerful, but overkill for our uses >>> IMO. >>> >>> Steve >>> >>> On Mon, Nov 17, 2008 at 7:37 PM, <[EMAIL PROTECTED]> wrote: >>>> Yes, I'm sure it's not a timing mode thing. The timing mode >>>> regressions didn't >>>> exist for x86 until very recently, and parser has been unstable >>>> for maybe as >>>> long as a year. >>>> >>>> Gabe >>>> >>>> Quoting Steve Reinhardt <[EMAIL PROTECTED]>: >>>> >>>>> Interestingly, I just ran on my desktop here and on zizzer and >>>>> both >>>>> failed, but when I looked more closely, I see that my desktop is >>>>> failing because it's running 5 fewer instructions than the >>>>> reference >>>>> output, while zizzer is failing because it's running 5 extra >>>>> instructions. (And yes, I double-checked and they both have the >>>>> same >>>>> reference instruction count.) Both of these seem pretty >>>>> consistent. >>>>> >>>>> I also checked the poolfs regression outputs and they get yet a >>>>> third >>>>> value, and amazingly the simple-atomic runs fail there too. All >>>>> of >>>>> the instruction counts vary only in the last couple of digits, >>>>> so I'll >>>>> just use those to summarize: >>>>> >>>>> ref zizzer poolfs home >>>>> simple-atomic 702 702 786 692 >>>>> simple-timing 697 702 786 692 >>>>> >>>>> So it doesn't appear to be a timing-mode thing; that's just a side >>>>> effect of us having inconsistent reference outputs for the two >>>>> runs. >>>>> >>>>> Steve >>>>> >>>>> On Mon, Nov 17, 2008 at 2:53 PM, <[EMAIL PROTECTED]> wrote: >>>>>> Exactly. Or one machine will be in Ann Arbor and the other in >>>>>> California. >>>>> Maybe >>>>>> it has something to do with the test checking the actual clock >>>>>> time/date on >>>>> the >>>>>> host somehow? It could behave slightly differently depending on >>>>>> some little >>>>> part >>>>>> of that like converting it to seconds changing the path the >>>>>> microcode takes >>>>> for >>>>>> the division instruction or something. >>>>>> >>>>>> Speaking of which, I think it would be really handy to >>>>>> distinguish between >>>>> the >>>>>> number of actual instructions that commit vs. the number of >>>>>> microops. If I >>>>> have >>>>>> to change microcode for some reason I'd expect the later to >>>>>> change, but the >>>>>> former probably means I broke something. >>>>>> >>>>>> Gabe >>>>>> >>>>>> Quoting nathan binkert <[EMAIL PROTECTED]>: >>>>>> >>>>>>> The biggest problem is that I've never been able to find two >>>>>>> machines >>>>>>> that behave differently. When things change, I can't find >>>>>>> something >>>>>>> that did it the "old" way. >>>>>>> >>>>>>> Nate >>>>>>> >>>>>>> >>>>>>>> If somebody can and wants to get a tracediff between two >>>>>>>> differently >>>>>>> behaving >>>>>>>> versions of parser, that would go a long way to figuring out >>>>>>>> what the >>>>>>> problem >>>>>>>> is. >>>>>>>> >>>>>>>> Gabe >>>>>>>> >>>>>>>> Quoting nathan binkert <[EMAIL PROTECTED]>: >>>>>>>> >>>>>>>>> I more meant that it seems like an infrequently used syscall >>>>>>>>> that uses >>>>>>>>> an uninitilaized variable that affects the return value >>>>>>>>> could easily >>>>>>>>> be the result. The stats differences in both simulations >>>>>>>>> are minimal >>>>>>>>> and similar. >>>>>>>>> >>>>>>>>> Nate >>>>>>>>> >>>>>>>>> On Mon, Nov 17, 2008 at 12:07 PM, Steve Reinhardt <[EMAIL PROTECTED] >>>>>>>>> > >>>>>>> wrote: >>>>>>>>>> I sort of doubt it... parser has always been a bit >>>>>>>>>> nondeterministic, >>>>>>>>>> where this is just a subtle and unforeseen but >>>>>>>>>> deterministic side >>>>>>>>>> effect of a bug fix. >>>>>>>>>> >>>>>>>>>> Steve >>>>>>>>>> >>>>>>>>>> On Mon, Nov 17, 2008 at 11:57 AM, nathan binkert <[EMAIL PROTECTED] >>>>>>>>>> > >>>>>>> wrote: >>>>>>>>>>> Ah, so that was you. That makes sense. I seriously >>>>>>>>>>> wonder if this >>>>> or >>>>>>>>>>> something like it is the problem with 20.parser. >>>>>>>>>>> >>>>>>>>>>> Nate >>>>>>>>>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
