Re: [m5-dev] parser error (was Re: changeset in m5: Update stats for brk fix (cset f28f020f3006).)

Ali Saidi Tue, 18 Nov 2008 17:49:33 -0800

I did some digging and I figured it out. There is a libc function  
(posix.1): int sysconf(int name).


The various system parameters you can query vary from the maximum  
number of arguments to the number of free physical pages in the  
system.  Parser doesn't call this function directly, however it does  
call qsort which checks to how much physical memory is available for  
use by calling sysconf(). The amount of physical memory is used to  
decide what algorithm is used and if enough memory can be allocated to  
have an additional copy of the items in memory. The glibc function  
falls back to returning -1 if it can't open /proc/meminfo and qsort  
handles this by assuming the correct amount of memory is available.

So there are some options:
1) Just return -1 and warn any time an application attempts to read / 
proc/*; glibc will fall back or return an error however it can produce  
a non-optimal situation.
2) "create" a /proc/meminfo based on the number of free/allocated  
pages so that glibc makes the correct decision.

I don't think anything else is really reasonable as it makes an non- 
optimal decision.

As an aside, it seems that obreak() should error appropriately when  
the amount of memory  requested exceeds the physical memory provided  
rather than fatal(). This would simply involve returning the old brk.

As a second aside, it's  possible that the person who is allocating  
16GB of ram and still running into problems  running some number of  
benchmarks could be running into an issue something like this.

Ali




On Nov 18, 2008, at 6:45 PM, Steve Reinhardt wrote:

> No rush... we've lived with this for quite a while, at least we know  
> why now.
>
> On Tue, Nov 18, 2008 at 3:10 PM, nathan binkert <[EMAIL PROTECTED]>  
> wrote:
>> I added support for this kind of file mapping stuff for the m5  
>> command
>> so I could load multiple files into the simulator in full system  
>> mode.
>> The python portion of this work could easily be ported to SE mode.
>> Unless you want diffs now, I'll work on getting this stuff in the  
>> tree
>> after ISCA.
>>
>> It basically allowed me to clean up all of the boot scripts and such.
>>
>> Nate
>>
>> On Tue, Nov 18, 2008 at 2:25 PM, Steve Reinhardt <[EMAIL PROTECTED]>  
>> wrote:
>>> Took me a lot longer than it should have in retrospect, but here's  
>>> the
>>> problem (from --trace-flags=SyscallVerbose):
>>>
>>> 594199893000: global: opening file /proc/meminfo
>>> 594199893000: system.cpu: syscall open returns 4
>>> 594200152000: system.cpu: syscall fstat called w/arguments
>>> 4,140737488339680,140737488339680,0
>>> 594200152000: system.cpu: syscall fstat returns 0
>>> [...]
>>> 594200272000: system.cpu: syscall read called w/arguments
>>> 4,46912559464448,8192,34
>>> 594200272000: system.cpu: syscall read returns 630
>>>
>>> I don't know *why* parser opens, fstats, and reads /proc/meminfo,  
>>> but
>>> that's clearly where the system dependence is coming from.  As far  
>>> as
>>> fixing the problem, the easiest thing would be to hack parser to not
>>> do that, or just not use parser in the regressions.
>>>
>>> If we wanted to get really fancy we could recognize /proc/meminfo as
>>> special and redirect it to some canned input.  It might be worth
>>> checking in open() and warning anytime anything under /proc gets
>>> opened.  Or maybe we should implement something like chroot inside  
>>> of
>>> SE mode, so you could get rid of all the path-based issues by  
>>> forcing
>>> everything to be relative to the working dir, and then use  
>>> symlinks to
>>> set up the structure you want... powerful, but overkill for our uses
>>> IMO.
>>>
>>> Steve
>>>
>>> On Mon, Nov 17, 2008 at 7:37 PM,  <[EMAIL PROTECTED]> wrote:
>>>> Yes, I'm sure it's not a timing mode thing. The timing mode  
>>>> regressions didn't
>>>> exist for x86 until very recently, and parser has been unstable  
>>>> for maybe as
>>>> long as a year.
>>>>
>>>> Gabe
>>>>
>>>> Quoting Steve Reinhardt <[EMAIL PROTECTED]>:
>>>>
>>>>> Interestingly, I just ran on my desktop here and on zizzer and  
>>>>> both
>>>>> failed, but when I looked more closely, I see that my desktop is
>>>>> failing because it's running 5 fewer instructions than the  
>>>>> reference
>>>>> output, while zizzer is failing because it's running 5 extra
>>>>> instructions.  (And yes, I double-checked and they both have the  
>>>>> same
>>>>> reference instruction count.)  Both of these seem pretty  
>>>>> consistent.
>>>>>
>>>>> I also checked the poolfs regression outputs and they get yet a  
>>>>> third
>>>>> value, and amazingly the simple-atomic runs fail there too.  All  
>>>>> of
>>>>> the instruction counts vary only in the last couple of digits,  
>>>>> so I'll
>>>>> just use those to summarize:
>>>>>
>>>>>                          ref   zizzer poolfs home
>>>>> simple-atomic   702  702    786     692
>>>>> simple-timing    697  702    786     692
>>>>>
>>>>> So it doesn't appear to be a timing-mode thing; that's just a side
>>>>> effect of us having inconsistent reference outputs for the two  
>>>>> runs.
>>>>>
>>>>> Steve
>>>>>
>>>>> On Mon, Nov 17, 2008 at 2:53 PM,  <[EMAIL PROTECTED]> wrote:
>>>>>> Exactly. Or one machine will be in Ann Arbor and the other in  
>>>>>> California.
>>>>> Maybe
>>>>>> it has something to do with the test checking the actual clock  
>>>>>> time/date on
>>>>> the
>>>>>> host somehow? It could behave slightly differently depending on  
>>>>>> some little
>>>>> part
>>>>>> of that like converting it to seconds changing the path the  
>>>>>> microcode takes
>>>>> for
>>>>>> the division instruction or something.
>>>>>>
>>>>>> Speaking of which, I think it would be really handy to  
>>>>>> distinguish between
>>>>> the
>>>>>> number of actual instructions that commit vs. the number of  
>>>>>> microops. If I
>>>>> have
>>>>>> to change microcode for some reason I'd expect the later to  
>>>>>> change, but the
>>>>>> former probably means I broke something.
>>>>>>
>>>>>> Gabe
>>>>>>
>>>>>> Quoting nathan binkert <[EMAIL PROTECTED]>:
>>>>>>
>>>>>>> The biggest problem is that I've never been able to find two  
>>>>>>> machines
>>>>>>> that behave differently.  When things change, I can't find  
>>>>>>> something
>>>>>>> that did it the "old" way.
>>>>>>>
>>>>>>>  Nate
>>>>>>>
>>>>>>>
>>>>>>>> If somebody can and wants to get a tracediff between two  
>>>>>>>> differently
>>>>>>> behaving
>>>>>>>> versions of parser, that would go a long way to figuring out  
>>>>>>>> what the
>>>>>>> problem
>>>>>>>> is.
>>>>>>>>
>>>>>>>> Gabe
>>>>>>>>
>>>>>>>> Quoting nathan binkert <[EMAIL PROTECTED]>:
>>>>>>>>
>>>>>>>>> I more meant that it seems like an infrequently used syscall  
>>>>>>>>> that uses
>>>>>>>>> an uninitilaized variable that affects the return value  
>>>>>>>>> could easily
>>>>>>>>> be the result.  The stats differences in both simulations  
>>>>>>>>> are minimal
>>>>>>>>> and similar.
>>>>>>>>>
>>>>>>>>>  Nate
>>>>>>>>>
>>>>>>>>> On Mon, Nov 17, 2008 at 12:07 PM, Steve Reinhardt <[EMAIL PROTECTED] 
>>>>>>>>> >
>>>>>>> wrote:
>>>>>>>>>> I sort of doubt it... parser has always been a bit  
>>>>>>>>>> nondeterministic,
>>>>>>>>>> where this is just a subtle and unforeseen but  
>>>>>>>>>> deterministic side
>>>>>>>>>> effect of a bug fix.
>>>>>>>>>>
>>>>>>>>>> Steve
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 17, 2008 at 11:57 AM, nathan binkert <[EMAIL PROTECTED] 
>>>>>>>>>> >
>>>>>>> wrote:
>>>>>>>>>>> Ah, so that was you.  That makes sense.  I seriously  
>>>>>>>>>>> wonder if this
>>>>> or
>>>>>>>>>>> something like it is the problem with 20.parser.
>>>>>>>>>>>
>>>>>>>>>>> Nate
>>>>>>>>>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] parser error (was Re: changeset in m5: Update stats for brk fix (cset f28f020f3006).)

Reply via email to