Hi Gabe, 

I think answer to my last question was subsumed in your prior response itself. 
Thanks for the clarifications.

Thanks,
Shyam

> On Aug 2, 2019, at 5:07 PM, Shyam Murthy <[email protected]> wrote:
> 
> Hi Gabe,
> 
> I was reading through this today 
> (https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=129
>  
> <https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=129>).
>  
> 
> Within gem5 however, for instructions MOVZX_B_R_M and MOVZX_W_R_M, the 
> translated microops are such that they entail partial register stalls, 
> because they are translated into load byte and load word, which in turn cause 
> a partial register stall. However, the usage of MOVZX/MOVSX instruction were 
> specifically optimizations developers and compilers use to eliminate these 
> partial register stalls. So, is the current modeling not overly conservative? 
> 
> My main concern also came from an application, where the compiler generates 
> load bytes zero extending them to 32 bits (as per the optimization), but gem5 
> still generates stalls because of the load byte microop. I think this might 
> have to be slightly remodeled I guess, but only place being conservative and 
> needing a stall might be when zero extension happens into 16 bits. 
> Let me know if you feel my thinking is correct. 
> 
> Thanks,
> Shyam
> 
>> On Aug 1, 2019, at 3:38 PM, Gabe Black <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> There is no way to disable that. The number and identity of the instructions 
>> sources/destinations would need to change based on the operand size, and 
>> that's not implemented. You could possibly add extra information to the 
>> microops to help determine when that sort of thing is happening. All the 
>> microops that do partial register updates have that behavior (so most of 
>> them), not just lea.
>> 
>> Gabe
>> 
>> On Wed, Jul 31, 2019 at 8:06 PM Shyam Murthy <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Thanks Gabe, suppose I’m trying to carry out a data flow analysis on the 
>> program, then quite often I rely on the source registers tagged by gem5. In 
>> this process, would I not be tracking false dependencies? Is there a way I 
>> can disable this?
>> 
>> Additionally, have you modelled the same only for LEA op, or for other 
>> operations too? You were making a call to merge method within the static 
>> inst class, I assumed this was because x86 has a lot of instructions like 
>> ADD AX, imm, where the source register is clobbered with the output as well. 
>> However, I guess primarily you have made calls to the merge method within 
>> the static inst class to also model partial register updates. 
>> 
>> Thanks,
>> Shyam
>> 
>>> On Jul 31, 2019, at 9:03 PM, Gabe Black <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Shyam. I think the reason is that x86 instructions (and the microops as 
>>> I've implemented them) can do partial register updates, ie writing to only 
>>> the lowest byte of a register. In that case, you need the old value to fill 
>>> in part of the new value of the register. When writing to 32 bits or more 
>>> of the register (although x86 is full of exceptions), you'd generally not 
>>> need the old value since you're either writing all 64 bits or zero 
>>> extending to 64 bits in the 32 bit case. That optimization is not 
>>> implemented, and may or may not be realistic.
>>> 
>>> Gabe
>>> 
>>> On Tue, Jul 30, 2019 at 2:40 PM Shyam Murthy <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> The main reason I am asking is because I am trying to do some dependency 
>>> analysis in the programs, and false dependencies show up in the process 
>>> because architecture registers that are destination registers also get 
>>> populated as source registers (when there is no true dependency). Am I 
>>> understanding something incorrectly? 
>>> 
>>> Thanks,
>>> Shyam
>>> 
>>> On Tue, Jul 30, 2019 at 2:25 PM Shyam Murthy <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi Gabe,
>>> 
>>> Why is that for some of the operations like ld and lea, the decoding logic 
>>> within build/X86/arch/generated/decoder-ns.cc.inc, the destination register 
>>> is also decoded as a src register?
>>> 
>>> Thanks,
>>> Shyam
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users 
>>> <http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users 
>> <http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users>_______________________________________________
>> gem5-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> 

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to