Hi Gabe, I think answer to my last question was subsumed in your prior response itself. Thanks for the clarifications.
Thanks, Shyam > On Aug 2, 2019, at 5:07 PM, Shyam Murthy <[email protected]> wrote: > > Hi Gabe, > > I was reading through this today > (https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=129 > > <https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=129>). > > > Within gem5 however, for instructions MOVZX_B_R_M and MOVZX_W_R_M, the > translated microops are such that they entail partial register stalls, > because they are translated into load byte and load word, which in turn cause > a partial register stall. However, the usage of MOVZX/MOVSX instruction were > specifically optimizations developers and compilers use to eliminate these > partial register stalls. So, is the current modeling not overly conservative? > > My main concern also came from an application, where the compiler generates > load bytes zero extending them to 32 bits (as per the optimization), but gem5 > still generates stalls because of the load byte microop. I think this might > have to be slightly remodeled I guess, but only place being conservative and > needing a stall might be when zero extension happens into 16 bits. > Let me know if you feel my thinking is correct. > > Thanks, > Shyam > >> On Aug 1, 2019, at 3:38 PM, Gabe Black <[email protected] >> <mailto:[email protected]>> wrote: >> >> There is no way to disable that. The number and identity of the instructions >> sources/destinations would need to change based on the operand size, and >> that's not implemented. You could possibly add extra information to the >> microops to help determine when that sort of thing is happening. All the >> microops that do partial register updates have that behavior (so most of >> them), not just lea. >> >> Gabe >> >> On Wed, Jul 31, 2019 at 8:06 PM Shyam Murthy <[email protected] >> <mailto:[email protected]>> wrote: >> Thanks Gabe, suppose I’m trying to carry out a data flow analysis on the >> program, then quite often I rely on the source registers tagged by gem5. In >> this process, would I not be tracking false dependencies? Is there a way I >> can disable this? >> >> Additionally, have you modelled the same only for LEA op, or for other >> operations too? You were making a call to merge method within the static >> inst class, I assumed this was because x86 has a lot of instructions like >> ADD AX, imm, where the source register is clobbered with the output as well. >> However, I guess primarily you have made calls to the merge method within >> the static inst class to also model partial register updates. >> >> Thanks, >> Shyam >> >>> On Jul 31, 2019, at 9:03 PM, Gabe Black <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Shyam. I think the reason is that x86 instructions (and the microops as >>> I've implemented them) can do partial register updates, ie writing to only >>> the lowest byte of a register. In that case, you need the old value to fill >>> in part of the new value of the register. When writing to 32 bits or more >>> of the register (although x86 is full of exceptions), you'd generally not >>> need the old value since you're either writing all 64 bits or zero >>> extending to 64 bits in the 32 bit case. That optimization is not >>> implemented, and may or may not be realistic. >>> >>> Gabe >>> >>> On Tue, Jul 30, 2019 at 2:40 PM Shyam Murthy <[email protected] >>> <mailto:[email protected]>> wrote: >>> The main reason I am asking is because I am trying to do some dependency >>> analysis in the programs, and false dependencies show up in the process >>> because architecture registers that are destination registers also get >>> populated as source registers (when there is no true dependency). Am I >>> understanding something incorrectly? >>> >>> Thanks, >>> Shyam >>> >>> On Tue, Jul 30, 2019 at 2:25 PM Shyam Murthy <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi Gabe, >>> >>> Why is that for some of the operations like ld and lea, the decoding logic >>> within build/X86/arch/generated/decoder-ns.cc.inc, the destination register >>> is also decoded as a src register? >>> >>> Thanks, >>> Shyam >>> _______________________________________________ >>> gem5-users mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>> <http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users> >> _______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> <http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>_______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
