Hi Mohit,

I wonder if the number of Physical register file entries is becoming a
bottleneck in the configuration you are using? Normally, I would expect
that 'ProdLo' and 'ProdHi' registers will be renamed to some physical
register and should not cause any dependency between two independent
multiply operations.

-Ayaz

On Tue, Jul 20, 2021 at 5:27 PM Mohit Gambhir via gem5-users <
[email protected]> wrote:

> Hi all,
>
>
>
> I am running a DerivO3CPU basesd SE mode simulation with x86 ISA. The
> micro benchmark that I am running contains a loop with independent multiply
> instructions. An excerpt from the disassembly of the benchmark loop looks
> something like this
>
>
>
>   400c07:             48 0f af d2                         imul   %rdx,%rdx
>
>   400c0b:             48 0f af db                         imul   %rbx,%rbx
>
> …
>
>
>
> When I look at the O3PipeView, I see that all the independent multiply
> instructions are issued sequentially, even though there are 2 multiply
> functional units and each of them is pipelined
>
>
>
> [................f....dn.pi..c.r.................................................]-(
> 16664000.0) 0x00400c07.0 IMUL_R_R                  [     34983]
>
> [................f....dn.p...ic.r................................................]-(
> 16664000.0) 0x00400c07.1 IMUL_R_R                  [     34984]
>
> [................f....dn.p...ic.r................................................]-(
> 16664000.0) 0x00400c07.2 IMUL_R_R                  [     34985]
>
> [................f....dn.p...i..c.r..............................................]-(
> 16664000.0) 0x00400c0b.0 IMUL_R_R                  [     34986]
>
> [................f....dn.p......ic.r.............................................]-(
> 16664000.0) 0x00400c0b.1 IMUL_R_R                  [     34987]
>
> [................f....dn.p......ic.r.............................................]-(
> 16664000.0) 0x00400c0b.2 IMUL_R_R                  [     34988]
>
> …
>
>
>
> Digging into it further I found that each of the IMUL_R_R instructions
> have Implicit Register 0 and 1 (ProdHi and ProdLow) added as a source and
> destination in the generated code. Following is the excerpt from
>  decoder-ns-cc.inc.
>
>
>
> Mul1sFlags::Mul1sFlags(…)
>
>     {
>
>
>
> …
>
> ….
>
>                setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
> INTREG_FOLDED(src1, foldOBit)));
>
>                setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
> INTREG_FOLDED(src2, foldOBit)));
>
>                setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
> INTREG_IMPLICIT(0)));
>
>                setDestRegIdx(_numDestRegs++, RegId(IntRegClass,
> INTREG_IMPLICIT(0)));
>
>                _numIntDestRegs++;
>
>                setSrcRegIdx(_numSrcRegs++, RegId(IntRegClass,
> INTREG_IMPLICIT(1)));
>
>                setDestRegIdx(_numDestRegs++, RegId(IntRegClass,
> INTREG_IMPLICIT(1)));
>
>
>
> …
>
> }
>
>
>
> This results in all the independent multiply instructions to execute
> sequentially and multiply throughput is 1/3.
>
> If we have multiple functional units, then should these implicit registers
> (ProdHi and ProdLo) be replicated for each of them, and if so, why add them
> as source and destination at all?
>
> Any clarifications or workaround for this?
>
>
>
> Thanks,
>
> Mohit
>
>
> _______________________________________________
> gem5-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to