> On Aug. 14, 2015, 8:03 p.m., Joel Hestness wrote: > > src/mem/slicc/ast/PeekStatementAST.py, line 74 > > <http://reviews.gem5.org/r/2983/diff/1/?file=48301#file48301line74> > > > > This is a very subtle change to already complex code, so I'm not sure I > > understand how it works. The patch description indicates that bad things > > could happen if controller activity occurred between the read and write of > > a RMW, but it doesn't really say what the bad things are. This code appears > > to only block intermediate activity when the write part of the RMW gets to > > the head of a controller queue. > > > > Can you clarify what was causing the deadlock, and why this works to > > fix it? > > Nilay Vaish wrote: > Joel, consider the following operations happen on some address A: > > RMW Read Address A > Some other mem op on Address A > RMW Write Address A > > > Currently, on seeing a RMW Read, the sequencer asks the controller to > block > any operation on address A other than those coming from the mandatory > queue. > This is to prevent losing coherence permissions on address A. My guess > is that > RMW operations in x86 have to always succeed, unlink LL/SC which I > believe can > fail. I think the expectation here is that no operation other than RMW > Write > on the same address would be issued from the manadatory queue. This > implies > the sequencer expects that the processing core would not issue any > operation > other than RMW Write. But Lena must have observed an instruction fetch > request > to the same address between RMW Read and RMW write. > > > Now, I think the proposed fix is a character away from correctness. The > condition should be: > > if (m_is_blocking && > (m_block_map.count(in_msg_ptr->m_$address_field) == 1) && > ((m_block_map[in_msg_ptr->m_$address_field] != &$qcode) || > (!in_msg_ptr->bypassBlocked()))) { > > > I am actually not in favor of fixing the problem the way this patch does > it. > As I said above, the controller expects that the mandatory queue will not > issue any operation on the address other than RMW Write. I think we > should > change the sequencer and put in a check. > > Joel Hestness wrote: > Hi Nilay. We've discussed this patch pretty thoroughly on the gem5-gpu > list (here: https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g > ). Based on some testing, the patch in this review request probably doesn't > work correctly, even if you change the condition as you suggest. I've posted > a preliminary fix on the thread linked here, but I'm not sure that is a good > way to do it either. I'd be happy to hear your take on our gem5-gpu > discussion.
I saw your patch. It is essentially what I had in mind when I said that the check should go in the sequencer. But I would prefer the check is made at line 577 just before insertRequest() is called, rather than at 177 as is being done currently in your patch. And mark isBlocked as const. - Nilay ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2983/#review6955 ----------------------------------------------------------- On July 21, 2015, 4:06 p.m., Lena Olson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2983/ > ----------------------------------------------------------- > > (Updated July 21, 2015, 4:06 p.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Changeset 10926:97dc170a3064 > --------------------------- > ruby: Fix protocol deadlock with RMW operations > > Previously, if there was a memory operation between the read part and the > write > part of a RMW transaction, a deadlock could occur. There was logic to allow > the > write part of the RMW transaction to bypass the blocked controller, but it > also > allowed other operations to bypass as well. Thus, only the write parts of a > RMW > operation can bypass. This patch adds a new function to the message class > which > controls whether a block can bypass the blocked controller. > > It is possible that an IFETCH is issued between the read and write parts of > the > RMW transaction. > > > Diffs > ----- > > src/mem/ruby/slicc_interface/Message.hh 5fe05690d03d > src/mem/ruby/slicc_interface/RubyRequest.hh 5fe05690d03d > src/mem/slicc/ast/PeekStatementAST.py 5fe05690d03d > > Diff: http://reviews.gem5.org/r/2983/diff/ > > > Testing > ------- > > Tested with the ruby random tester as well as running some multiprogrammed > workloads, including ones where this deadlock manifested. The multicore > tests I have are for x86, so if those with more knowledge of other ISAs could > comment that would be helpful. > > Many thanks to Jason Power for help in tracking this down and fixing it. > > > Thanks, > > Lena Olson > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
