Hi Andreas, (moving back to gem5-dev since I suspect other will be interested)
I've dug myself out of my email hole and I think I can help out here. I read through your trace and I know what is going on. As Nilay already mentioned, we know that functional accesses, especially functional writes, will not be successful if they race with timing requests. Even though the hello world test uses a single in-order SimpleTiming CPU, a timing request is racing with the functional write. Specifically, the writeback of block 0x89580 and the directory waiting for the data to be written to DRAM, is racing with the fstat syscall's functional write to the same block. I know it is a little hard to figure all that out from staring at the current trace with all Ruby flags turned on. In the future, I would recommend just turning on the ProtocolTrace Flag. It will be much easier to understand what is going on. Though I think I understand the problem, I'm not quite sure how to fix it. When Nilay added functional access support to Ruby, Nilay and I were hoping this situation would not occur. However, since this is just the simple 1-cpu hello world test, I think it is pretty obvious that we are going to have to deal with this situation somehow. We could just deal with these situations one-by-one by modifying the AccessPermissions for particular states. Specifically here we and solve this problem by setting the permission of Dir:WB_E_W to Read_Write. However, is that how we want to try to solve all these issues? There are certain races that we simply can't get around by better specifying AccessPermissions. Nilay, what do you think? Brad > -----Original Message----- > From: Andreas Hansson [mailto:[email protected]] > Sent: Friday, January 06, 2012 7:49 AM > To: Nilay Vaish > Cc: Beckmann, Brad; Ali Saidi > Subject: RE: [gem5-dev] One failing Ruby regression after memory-system > patches > > Hi Nilay, > > I have done some more digging and these are the final events before the > functional write request to 0x89580: > > 1. doSyscall in syscall_emul.cc gets called: > SyscallDesc::doSyscall (this=0x1240f08, callnum=91, > process=0x1c8aa80, tc=0x1c932a0) > > 2. this in turn calls SyscallReturn fstatFunc<AlphaLinux>(SyscallDesc*, int, > LiveProcess*, ThreadContext*) () > > 3. this in turn calls writeBlob on the SETranslatingPortProxy > SETranslatingPortProxy::writeBlob ( > this=<value optimized out>, addr=4831384928, p=<value optimized out>, > size=<value optimized out>) > at > build/ALPHA_SE_MOESI_hammer/mem/se_translating_port_proxy.cc:125 > > This is done to address 0x89560 and the blobHelper chops it up in two pieces. > The first write goes fine and the second one causes the problem: > > 136455: system_port: system_port blobHelper to address 0x89560 > 136455: system.sys_port_proxy-port0: Functional access caught for address > 0x89560 > 136455: system.sys_port_proxy-port0: Request found in 0 - 0x7ffffff range > 136455: system.sys_port_proxy-port0: Functional Write request for > [0x89560, line 0x89540] > 136455: system.sys_port_proxy-port0: num_busy = 0, num_ro = 0, num_rw > = 1 > 136455: system.sys_port_proxy-port0: [ 0xb0 0xab 0x3 0x20 0x1 0x0 0x0 0x0 > 0x40 0x45 0x8 0x20 0x1 0x0 0x0 0x0 0x0 0x20 0x0 0x0 0x0 0x0 0x0 0x0 0xd 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] > 136455: system.sys_port_proxy-port0: [ 0xb0 0xab 0x3 0x20 0x1 0x0 0x0 0x0 > 0x40 0x45 0x8 0x20 0x1 0x0 0x0 0x0 0x0 0x20 0x0 0x0 0x0 0x0 0x0 0x0 0xd 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0xe 0x9d 0xc1 0x2a 0xe8 0x21 0x0 0x0 > 0x1 0x0 0x0 0x0 0x17 0x87 0x0 0x0 0xbb 0x2 0x0 0x0 0xd 0x88 0x0 0x0 0x0 0x0 > 0x0 0x0 ] > 136455: system.sys_port_proxy-port0: [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 ] > 136455: system.sys_port_proxy-port0: [ 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 > 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0xe 0x9d 0xc1 0x2a 0xe8 0x21 0x0 0x0 0x1 0x0 > 0x0 0x0 0x17 0x87 0x0 0x0 0xbb 0x2 0x0 0x0 0xd 0x88 0x0 0x0 0x0 0x0 0x0 0x0 ] > 136455: system.physmem: Write of size 32 on address 0x89560 > 136455: system.physmem: 00000000 0a 00 00 00 0e 9d c1 2a e8 21 00 00 01 00 > 00 00 A*h! > 136455: system.physmem: 00000010 17 87 00 00 bb 02 00 00 0d 88 00 00 00 > 00 00 00 ; > 136455: system.sys_port_proxy-port0: Functional access successful! > 136455: system_port: system_port blobHelper to address 0x89580 > 136455: system.sys_port_proxy-port0: Functional access caught for address > 0x89580 > 136455: system.sys_port_proxy-port0: Request found in 0 - 0x7ffffff range > 136455: system.sys_port_proxy-port0: Functional Write request for > [0x89580, line 0x89580] > 136455: system.sys_port_proxy-port0: num_busy = 1, num_ro = 0, num_rw > = 0 > > This is where the panic kicks in and kills the simulation. I am still puzzled > how > all other regressions work and this one fails. Any ideas what could be going > wrong? > > Andreas > > > -----Original Message----- > From: Nilay Vaish [mailto:[email protected]] > Sent: 04 January 2012 10:21 > To: Andreas Hansson > Cc: Beckmann, Brad; Ali Saidi > Subject: RE: [gem5-dev] One failing Ruby regression after memory-system > patches > > On Wed, 4 Jan 2012, Andreas Hansson wrote: > > > Hi Nilay, > > > > Thanks for the swift response. > > > > I would think the functional access is being made either to: 1) load a > > binary, or 2) "fake" an access from some thread/process. Patch 949 > > essentially forces all functional accesses to go through a real > > structural port, so the path through the interconnect may now be > > different (and it could have been bypassed altogether in the past as > > some functional ports connected straight to memory and ignored any > > data buffered in the interconnect). There should be no timing changes > > due to the patch as it only affects untimed functional accesses. > > > > In case of Ruby, when a functional access is received at the Ruby Port, all > the > controllers are checked for whether or not they have the cache line for this > address and in what state. When this particular functional access is made, > one controller is already trying to access (in timing > mode) the cache line, but the data is still in the interconnect some where. > Given your explanation, are you trying to imply that earlier this particular > functional access was not going through Ruby? > > > Would you think changing the panic to a warn is the way to go? > > Well, as you said this access might be needed for loading a binary, would not > an error in loading the binary result in something bad happening sooner or > later? Unless that functional access is retried at a different instant of > time, > this is a situation for panicing. > > -- > Nilay > > > > > > > Andreas > > > > -----Original Message----- > > From: Nilay Vaish [mailto:[email protected]] > > Sent: 03 January 2012 19:05 > > To: Andreas Hansson > > Cc: Beckmann, Brad; Ali Saidi > > Subject: RE: [gem5-dev] One failing Ruby regression after > > memory-system patches > > > > Andreas, reading the trace, it does not seem anything is going on wrong. > > We know that functional accesses in Ruby can fail and in this case it > > fails because the data to be accessed functionally is some where in > > the inter connection network. > > > > But this is a regression test which has been in existence before the > > functional access support was added to Ruby. I have no idea as to why > > a functional access is being made. Also patch 949 does not provide any > > clues. To me it seems like, it would not affect Ruby at all. > > > > Can you track the source of these functional requests? Also, do your > > patches change the way things are timed currently? > > > > -- > > Nilay > > > > > > On Tue, 3 Jan 2012, Andreas Hansson wrote: > > > >> The output with Ruby as the only flag is attached. Sorry for the > >> large file (still 2 MB but a good reduction from the 76 MB unpacked). > >> I have kept this off the mailing list intentionally. > >> > >> Let me know if I can provide any further information. > >> > >> Thanks! > >> > >> Andreas > >> > >> > >> -----Original Message----- > >> From: [email protected] [mailto:gem5-dev- > [email protected]] On > >> Behalf Of Nilay Vaish > >> Sent: 03 January 2012 17:49 > >> To: gem5 Developer List > >> Subject: Re: [gem5-dev] One failing Ruby regression after > >> memory-system patches > >> > >> Can you provide the trace obtained with debug flag Ruby? If that is > >> too long, may be with RubyPort only. > >> > >> -- > >> Nilay > >> > >> On Tue, 3 Jan 2012, Andreas Hansson wrote: > >> > >>> Dear all (and Brad in particular), > >>> > >>> With the memory-system patch http://reviews.m5sim.org/r/949/ > >>> applied, all regressions work, besides one: > >>> > build/ALPHA_SE_MOESI_hammer/tests/opt/quick/00.hello/alpha/linux/sim > >>> ple-timing-ruby-MOESI_hammer > >>> > >>> simerr contains: > >>> warn: Sockets disabled, not accepting gdb connections > >>> fatal: Ruby functional write failed for address 0x89580 @ cycle > >>> 136455 > >>> > [recvFunctional:build/ALPHA_SE_MOESI_hammer/mem/ruby/system/Ruby > Port > >>> .cc, line 449] Memory Usage: 242204 Kbytes > >>> > >>> Changing the fatal to a warn allows the regression to succeed, and with > the simerr containing: > >>> warn: Sockets disabled, not accepting gdb connections > >>> warn: Ruby functional write failed for address 0x89580 > >>> hack: be nice to actually delete the event here > >>> > >>> It seems very strange that all other regressions are successful and this > one not. Could it be a bug in the Ruby code? I strongly doubt it is related to > patch 949, but do not know Ruby well enough to say for sure. > >>> > >>> Ideas and suggestions are welcome. > >>> > >>> Thanks. > >>> > >>> Andreas > >>> > >>> > >>> -- IMPORTANT NOTICE: The contents of this email and any attachments > are confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > >>> _______________________________________________ > >>> gem5-dev mailing list > >>> [email protected] > >>> http://m5sim.org/mailman/listinfo/gem5-dev > >>> > >> _______________________________________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/gem5-dev > >> > >> > >> -- IMPORTANT NOTICE: The contents of this email and any attachments > are confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, > please notify the sender immediately and do not disclose the contents to any > other person, use it for any purpose, or store or copy the information in any > medium. Thank you. > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, > please notify the sender immediately and do not disclose the contents to any > other person, use it for any purpose, or store or copy the information in any > medium. Thank you. > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
