I'd say go ahead and commit it. I'm 98% sure that once we properly fail SCs when they get invalidated it won't be necessary, but I can't promise I'll get that in any time soon, and in the meantime it's a pretty good stopgap.
If you can put a comment in the flyspray task I mentioned earlier ( http://www.m5sim.org/flyspray/task/197) with the hg cset number then that'll remind us to revisit this patch once we get that fix in. Thanks, Steve On Jan 5, 2008 6:23 PM, Ali Saidi <[EMAIL PROTECTED]> wrote: > Steve, > Should we commit Geoff's fix or is there a better way to fix it? > > Ali > > > On Dec 29, 2007, at 1:39 PM, Steve Reinhardt wrote: > > > OK, great, glad you tracked that bug down. Your fix is a pretty good > > one, but I think the right answer is that CPU3 in your example should > > not issue a ReadEx... if it knows that it's requesting the block for a > > store conditional, and it sees that the block has been invalidated, it > > should fail the store conditional without getting an exclusive copy. > > In fact the current behavior is broken in that it can lead to > > livelock; if there are a lot of CPUs doing what CPU3 is doing at the > > same time, then they could prevent any cache from successfully > > completeing an ll/sc sequence. Could this be what you're seeing at 16 > > CPUs? > > > > As far as the "allocating bonus target for snoop" messages, you > > shouldn't worry about those; the best thing is probably just to up the > > number of targets per MSHR and that should go away. The issue is that > > we use up an MSHR target when we save a request for a deferred snoop, > > but since there's no way to nack a snoop, we really have no choice > > once the MSHR's targets are full but to keep allocating them anyway. > > So until/unless we come up with a way to nack snoops, which we > > probably never will, then this really should be a warning that the > > number of targets per MSHR is set too low. There is an upper bound on > > the number of targets that would be needed, basically the sum of the > > max number of outstanding accesses from above (which is a function of > > the CPU model for an L1 or the number of caches above for an L2+), > > plus the max number of outstanding snoops for a single block (which > > would be a function of the number of other caches in the system). > > > > Let me know if there's anything else I can help with. > > > > Steve > > > > On Dec 29, 2007 6:41 AM, Geoffrey Blake <[EMAIL PROTECTED]> wrote: > >> Steve, > >> > >> What you described below is exactly what was happening when I was > >> going > >> through the bus and cache traces. With more than 2 CPUs, you would > >> get into > >> a condition where CPU1 would release its spin-lock, then CPU2 and > >> CPU3 would > >> both read the line and try to do a store-conditional. At this > >> point there > >> are 2 UpgradeReq's trying to get the bus. Say CPU2 gets the bus > >> first, so > >> it invalidates CPU1 and CPU3's cache lines. CPU3 gets the bus next > >> and > >> issues a ReadExReq because its line was invalidated, this then > >> invalidates > >> CPU2's pending cache fill. CPU3 will fail the store-conditional > >> and mark > >> the line as exclusive only. If another CPU tries to read the same > >> line to > >> get the spin lock, it will get a stale value from a lower level of > >> cache, > >> ignoring the up to date value. This makes the kernel do some > >> bizarre things > >> as you would imagine. For 16+ CPUs there is something else wrong, > >> but that > >> one gets many of the "warn: bonus snoop allocated" messages, so I'm > >> wondering what could be happening there. > >> > >> Geoff > >> > >> > >> -----Original Message----- > >> From: [EMAIL PROTECTED] [mailto:m5-users- > >> [EMAIL PROTECTED] On > >> Behalf Of Steve Reinhardt > >> Sent: Saturday, December 29, 2007 12:26 AM > >> To: M5 users mailing list > >> Subject: Re: [m5-users] full-system issue in m5 beta 4 > >> > >> Geoff, > >> > >> Do you have any more information on what the problem was that this > >> patch fixes? On the face of it, the patch doesn't make sense... the > >> original code only marks the block dirty if the block was written to, > >> while with your patch it will get marked as dirty even in the case of > >> a failed store conditional that doesn't actually modify the block. > >> So > >> locally it seems wrong. > >> > >> However I can imagine that there might be some global situation where > >> a block is dirty (owned) in cache A, and then cache B requests an > >> exclusive copy for a store conditional, but in the meantime something > >> else happens that causes the store conditional to fail. So then > >> cache > >> B gets A's dirty copy, but fails to mark it as dirty, so then later > >> it > >> doesn't get written back, and A's modification is lost. Does this > >> sound like what's happening? If so, then this may well be the right > >> fix, but I'd have to think about it a little more... the key issue is > >> that right now when a cache receives an exclusive copy of a block it > >> doesn't really pay attention to whether it's getting it from memory > >> (in which case it's OK not to mark it dirty) or from another cache > >> (in > >> which case it must be marked dirty). > >> > >> Though at this point I can't think of a reason it would be incorrect > >> to always mark the block dirty even if the store conditional fails... > >> you might suffer an extra writeback in some very rare circumstances, > >> but it should still be functionally correct. So perhaps your patch > >> is > >> the right solution. > >> > >> Steve > >> > >> On Dec 28, 2007 9:16 PM, Nathan Binkert <[EMAIL PROTECTED]> wrote: > >>> What are the implications of this diff? I'm not clear on the > >>> funcitons in > >>> question, but if this was wrong for 4 and 8 cpus, it seems like > >>> it's just > >>> fundamentally wrong. Steve? > >>> > >>> Nate > >>> > >>> > >>>> To those looking for a fix to booting M5 in FS mode with more > >>>> than 2 > >>>> CPUs, I've attached a diff that fixes some of the problems. I > >>>> have M5 > >>>> booting with 4 and 8 CPUs using timing simple CPU and caches and > >>>> the > >>>> l2cache. 16 CPUs and above, its still getting stuck. > >>>> > >>>> Geoff > >>>> > >>>> Quoting Ali Saidi <[EMAIL PROTECTED]>: > >>>> > >>>>> Normally I add a -s to that command line because I want to create > >>>>> checkpoints with the atomic cpu, I restore from the checkpoints > >>>>> immediately into the timing cpu where the caches are warmed up and > >>>>> then I switch to a detailed cpu model. The -w flag has no meaning > >>>>> unless the -s (standard switch) flag is used. > >>>>> > >>>>> You'll need to modify the scripts a little bit if you want to do > >>>>> anything else. If you want to just transition into a timing cpu > >>>>> and > >>>>> not into a detailed cpu you'll need to change line 64 in > >>>>> Simulation.py from root.switch_cpus = switch_cpus to > >>>>> testsys.switch_cpus = switch_cpus and then add some code to > >>>>> alter the > >>>>> atomic warm up period. Alternatively you could use the standard > >>>>> switch code and change the O3 cpu to another timing cpu if you > >>>>> wanted to end up with a simple cpu model that would allow > >>>>> statistics > >>>>> to be collected on the other cpu after the switch over. > >>>>> > >>>>> Ali > >>>>> > >>>>> > >>>>> On Dec 17, 2007, at 6:49 AM, abc def wrote: > >>>>> > >>>>>> I tried using following command line: > >>>>>> build/ALPHA_FS/m5.opt configs/example/fs.py -n 4 -r 1 > >>>>>> --timing --caches -w 50000000000, so that it switches > >>>>>> to timing simple cpu only after warming up caches with > >>>>>> atomic simple cpu. > >>>>>> But nothing is happening in console. It is not getting > >>>>>> restored from checkpoint. > >>>>>> > >>>>>> I am using system files from version b3. > >>>>>> > >>>>>> Can you please forward me the command line you use > >>>>>> for booting up timing simple cpu. > >>>>>> > >>>>>> > >>>>>> --- Ali Saidi <[EMAIL PROTECTED]> escribió: > >>>>>> > >>>>>>> It's another bug, but since we never really boot > >>>>>>> with timing and > >>>>>>> caches it's not surprising that we haven't seen it > >>>>>>> before. > >>>>>>> > >>>>>>> Ali > >>>>>>> > >>>>>>> On Dec 16, 2007, at 11:43 PM, Nathan Binkert wrote: > >>>>>>> > >>>>>>>> This could honestly be just because it takes a > >>>>>>> long time. With > >>>>>>>> timing and caches, the simulator is pretty slow. > >>>>>>>> > >>>>>>>>> This is working if caches option is not used. > >>>>>>>>> > >>>>>>>>> But with L1,L2 cache present and with multiple > >>>>>>> cpus it > >>>>>>>>> is still getting stuck while booting. > >>>>>>>>> > >>>>>>>>> command line used: > >>>>>>>>> build/ALPHA_FS/m5.opt configs/example/fs.py -n 4 > >>>>>>>>> --timing --caches --l2cache > >>>>>>>>> > >>>>>>>>> --- Ali Saidi <[EMAIL PROTECTED]> escribió: > >>>>>>>>> > >>>>>>>>>> There is an issue in b4 with when the CPU ids > >>>>>>> get > >>>>>>>>>> assigned to CPUs > >>>>>>>>>> that can cause some weird behavior in all > >>>>>>>>>> multi-processor > >>>>>>>>>> configurations (2,3,4, xxx cpus). The attach > >>>>>>> patch > >>>>>>>>>> fixes those problems. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Ali > >>>>>>>>>> > >>>>>>>>>> On Dec 16, 2007, at 2:53 AM, Ali Saidi wrote: > >>>>>>>>>> > >>>>>>>>>>> Yea, you found a bug. I found the changeset > >>>>>>> that > >>>>>>>>>> caused the problem, > >>>>>>>>>>> and I'll try to figure out what is going on > >>>>>>>>>> tomorrow and post a patch. > >>>>>>>>>>> > >>>>>>>>>>> In the future please create a new topic on the > >>>>>>>>>> mailing list by > >>>>>>>>>>> sending a new message to m5-users@m5sim.org > >>>>>>>>>> instead of replying to a > >>>>>>>>>>> current topic and changing the subject. > >>>>>>> Replying > >>>>>>>>>> to the same topic > >>>>>>>>>>> and just changing the subject preserves the > >>>>>>>>>> In-Reply-To mail header > >>>>>>>>>>> and makes it more difficult to reconstruct > >>>>>>> threads > >>>>>>>>>> of conversation > >>>>>>>>>>> on the mailing list. > >>>>>>>>>>> > >>>>>>>>>>> Ali > >>>>>>>>>>> > >>>>>>>>>>> On Dec 15, 2007, at 7:57 PM, abc def wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Timing simple cpu in full system mode in m5 > >>>>>>> beta > >>>>>>>>>> 4 is > >>>>>>>>>>>> not booting up. In the console it is getting > >>>>>>>>>> stuck > >>>>>>>>>>>> into "NET: Registered protocol family 2" and > >>>>>>> is > >>>>>>>>>> not > >>>>>>>>>>>> proceeding forward. > >>>>>>>>>>>> > >>>>>>>>>>>> System files are from: > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> http://www.m5sim.org/dist/current/m5_system_2.0b3.tar.bz2 > >>>>>>>>>>>> > >>>>>>>>>>>> This is happening if 4 cpus are used for > >>>>>>> booting. > >>>>>>>>>> For > >>>>>>>>>>>> 1 cpu it is ok. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ______________________________________________ > >>>>>>>>>>>> ¿Chef por primera vez? > >>>>>>>>>>>> Sé un mejor Cocinillas. > >>>>>>>>>>>> http://es.answers.yahoo.com/info/welcome > >>>>>>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>>>>>>> m5-users mailing list > >>>>>>>>>>>> m5-users@m5sim.org > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> m5-users mailing list > >>>>>>>>>>> m5-users@m5sim.org > >>>>>>>>>>> > >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>> m5-users mailing list > >>>>>>>>>> m5-users@m5sim.org > >>>>>>>>>> > >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ______________________________________________ > >>>>>>>>> ¿Chef por primera vez? > >>>>>>>>> Sé un mejor Cocinillas. > >>>>>>>>> http://es.answers.yahoo.com/info/welcome > >>>>>>>>> _______________________________________________ > >>>>>>>>> m5-users mailing list > >>>>>>>>> m5-users@m5sim.org > >>>>>>>>> > >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>>> _______________________________________________ > >>>>>>>> m5-users mailing list > >>>>>>>> m5-users@m5sim.org > >>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> m5-users mailing list > >>>>>>> m5-users@m5sim.org > >>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> ______________________________________________ > >>>>>> ¿Chef por primera vez? > >>>>>> Sé un mejor Cocinillas. > >>>>>> http://es.answers.yahoo.com/info/welcome > >>>>>> _______________________________________________ > >>>>>> m5-users mailing list > >>>>>> m5-users@m5sim.org > >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> m5-users mailing list > >>>>> m5-users@m5sim.org > >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>> ----- End forwarded message ----- > >>>> > >>>> > >>> _______________________________________________ > >>> m5-users mailing list > >>> m5-users@m5sim.org > >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >>> > >> _______________________________________________ > >> m5-users mailing list > >> m5-users@m5sim.org > >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >> > >> > >> > >> No virus found in this incoming message. > >> Checked by AVG Free Edition. > >> Version: 7.5.516 / Virus Database: 269.17.11/1201 - Release Date: > >> 12/28/2007 > >> 11:51 AM > >> > >> > >> No virus found in this outgoing message. > >> Checked by AVG Free Edition. > >> Version: 7.5.516 / Virus Database: 269.17.11/1201 - Release Date: > >> 12/28/2007 > >> 11:51 AM > >> > >> > >> > >> _______________________________________________ > >> m5-users mailing list > >> m5-users@m5sim.org > >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > >> > > _______________________________________________ > > m5-users mailing list > > m5-users@m5sim.org > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > _______________________________________________ > m5-users mailing list > m5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >
_______________________________________________ m5-users mailing list m5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/m5-users