Sure, I can show some code snippets. First, here is the code for the read
micro-op for an atomic read-add-write:
temp = Mem_sd;
And the modify-write micro-op:
Rd_sd = temp;
Mem_sd = Rs2_sd + temp;
The memory address comes from Rs1. The variable "temp" is a temporary
location shared between the read and modify-write micro-ops (the address
from Rs1 is shared similarly to ensure it's the same when the instructions
are issued).
In the constructor for the macro-op, I've included some code that
explicitly sets the src and dest register indices so that they are
displayed properly for execution traces:
_numSrcRegs = 2;
_srcRegIdx[0] = RS1;
_srcRegIdx[1] = RS2;
_numDestRegs = 1;
_destRegIdx[0] = RD;
So far, this works for the O3 model. But, in the minor model, it tries to
execute the modify-write micro-op before the read micro-op is executed.
The address is never loaded from Rs1, and so a segmentation fault often
occurs. To try to fix it, I added this code to the constructors of each of
the two micro-ops:
_numSrcRegs = _p->_numSrcRegs;
for (int i = 0; i < _numSrcRegs; i++)
_srcRegIdx[i] = _p->_srcRegIdx[i];
_numDestRegs = _p->_numDestRegs;
for (int i = 0; i < _numDestRegs; i++)
_destRegIdx[i] = _p->_destRegIdx[i];
_p is a pointer to the "parent" macro-op. With this code, it works with
minor model, but the final calculated value in the modify-write micro-op
never gets written at the end of the instruction in the O3 model.
On Fri, Jul 29, 2016 at 2:50 PM, Steve Reinhardt <[email protected]> wrote:
> I'm still confused about the problems you're having. Stores should never
> be executed speculatively in O3, even without the non-speculative flag.
> Also, assuming the store micro-op reads a register that is written by the
> load micro-op, then that true data dependence through the intermediate
> register should enforce an ordering. Whether that destination register is
> also a source or not should be irrelevant, particularly in O3 where all the
> registers get renamed anyway.
>
> Perhaps if you show some snippets of your actual code it will be clearer
> to me what's going on.
>
> Steve
>
>
> On Fri, Jul 29, 2016 at 9:33 AM Alec Roelke <[email protected]> wrote:
>
>> Yes, that sums up my issues. I haven't gotten to tackling the second one
>> yet; I'm still working on the first. Thanks for the patch link, though,
>> that should help a lot when I get to it.
>>
>> To be more specific, I can get it to work with either the minor CPU model
>> or the O3 model, but not both at the same time. To get it to work with the
>> O3 model, I added the "IsNonSpeculative" flag to the modify-write micro-op,
>> which I assumed would prevent the O3 model from speculating on its
>> execution (which I also had to do with regular store instructions to ensure
>> that registers containing addresses would have the proper values when the
>> instruction executed). This works, but when I use it in the minor CPU
>> model, it issues the modify-write micro-op before the read micro-op
>> executes, meaning it hasn't loaded the memory address from the register
>> file yet and causes a segmentation fault.
>>
>> I assume this is caused by the fact that the code for the read operation
>> doesn't reference any register, as the instruction writes the value that
>> was read from memory to a dest register before modifying it and writing it
>> back. Because the dest register can be the same as a source register, I
>> have to pass the memory value from the read micro-op to the modify-write
>> micro-op without writing it to a register to avoid potentially polluting
>> the data written back.
>>
>> My fix was to explicitly set the source and dest registers of both
>> micro-ops to what was decoded by the macro-op so GEM5 can infer
>> dependencies, but then when I try it using the O3 model, the modify-write
>> portion does not appear to actually write back to memory.
>>
>> On Fri, Jul 29, 2016 at 12:00 PM, <[email protected]> wrote:
>>
>>> There are really two issues here, I think:
>>>
>>> 1. Managing the ordering of the two micro-ops in the pipeline, which
>>> seems
>>> to be the issue you're facing.
>>> 2. Providing atomicity when you have multiple cores.
>>>
>>> I'm surprised you're having problems with #1, because that's the easy
>>> part.
>>> I'd assume that you'd have a direct data dependency between the micro-ops
>>> (the load would write a register that the store reads, for the load to
>>> pass
>>> data to the store) which should enforce ordering. In addition, since
>>> they're both accessing the same memory location, there shouldn't be any
>>> reordering of the memory operations either.
>>>
>>> Providing atomicity in the memory system is the harder part. The x86
>>> atomic
>>> RMW memory ops are implemented by setting LOCKED_RMW on both the load and
>>> store operations (see
>>> http://grok.gem5.org/source/xref/gem5/src/mem/request.hh#145, as well
>>> as src/arch/x86/isa/microops/ldstop.isa). This works with AtomicSimpleCPU
>>> and with Ruby, but there is no support for enforcing this atomicity in
>>> the
>>> classic cache in timing mode. I have a patch that provides this but you
>>> have to apply it manually: http://reviews.gem5.org/r/2691.
>>>
>>> Steve
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 9:10 AM Alec Roelke <[email protected]> wrote:
>>>
>>> > Hello,
>>> >
>>> > I'm trying to add an ISA to gem5 which has several atomic
>>> > read-modify-write instructions. Currently I have them implemented as
>>> pairs
>>> > of micro-ops which read data in the first operation and then
>>> modify-write
>>> > in the second. This works for the simple CPU model, but runs into
>>> trouble
>>> > for the minor and O3 models, which want to execute the modify-write
>>> half
>>> > before the load half is complete. I tried forcing both parts of the
>>> > instruction to have the same src and dest register indices, but that
>>> causes
>>> > other problems with the O3 model.
>>> >
>>> > Is there a way to indicate that there is a data dependency between the
>>> two
>>> > micro-ops in the instruction? Or, better yet, is there a way I could
>>> > somehow have two memory accesses in one instruction without having to
>>> split
>>> > it into micro-ops?
>>> >
>>> > Thanks,
>>> > Alec Roelke
>>> > _______________________________________________
>>> > gem5-users mailing list
>>> > [email protected]
>>> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <
>>> http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20160728/dc22e5dd/attachment-0001.html
>>> >
>>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users