Hi all, follow-up question.

In ARMv8, the LDP instruction:

LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

Will load a pair of 128-bit values (256 in total) from memory to two Q 
registers (128-bit vector registers).

When I run debug gem5 to see how the said LDP instruction operates (in 
AtomicCPU, for now), I see that it is broken down into 3 micro-ops: 2 loads + 1 
register writeback (due to post-increment I'm using in the instruction).

However, I don't get why gem5 triggers two memory loads if the 256-bit that 
will feed the registers are contiguous in memory. Couldn't memory provide 
256-bit to feed both dest. registers at once?
Some possible reasons I thought:
- memory port only allows 128-bit loads.
  Although this could be the case, reading the size of a cache line (64B) would 
sound more reasonable.

- we have only one write port
  We need two load micro-ops because we can write only one destination register 
at a time (and we have two destination registers).
  But, in this case, why issue a new memory load in the second uop, if the 
previous load had already brought the data (considering memory returns 
64B/512-bits)? Why not keep the data memory within the "macro op context" (if 
such a thing exists)? Is it simply relying on the cache?

Any clarification on what is the reason for the functioning of this operation 
(or macro memory operations in ARM as a whole) is much welcomed!

Thank you, 
Pedro.
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to