Hi Gabe,

> -----Original Message-----
> From: Gabe Black via gem5-dev <gem5-dev@gem5.org>
> Sent: 09 August 2021 11:02
> To: gem5 Developer List <gem5-dev@gem5.org>
> Cc: Gabe Black <gabe.bl...@gmail.com>
> Subject: [gem5-dev] Re: overview/documentation/tests for vector register
> related stuff?
>
> I've done a bit of digging so far, and I think I've figured out a bit about 
> the
> rename mode.
>
> 1. This is only used by ARM to handle the difference in how registers are
> renamed in aarch64 vs otherwise.
> 2. This is handled in O3 by detecting a squash in the CPU and then checking
> the aarch64 bit of the PCState.
> 3. If this changes, then O3 potentially shuffles things around to make 
> register
> chunks contiguous, and starts renaming things differently.
> 4. The only way to switch in or out of aarch64 is through a fault.

Yes, just to be more precise, it is happening by issuing a fault or by 
returning from a fault (this is just to make it clear
the switch can happen with a non faulting instruction like an ERET)

>
> This leads me to a few conclusions.
>
> 1. Having the aarch64 bit in the PCState structure is probably not necessary
> and may actually be harmful because it makes that structure larger and
> slower to move around. This value does *not* change quickly or frequently,
> and only changes as part of an already heavy mode switch. It does not need
> to be predicted/predictable like a next PC, like something like thumb mode
> might.

You can figure out the execution mode (AArch64/AArch32) in a different way by 
inspecting
The PSTATE. So I can see the redundancy. However, inspecting the PSTATE/CPSR 
from the TC is probably not
going to be faster. We need to know the aarch64 in the decoder, so I guess we 
could cache it in there.

In any case IMO I don't think removing it from the PCState is gonna affect in 
any measurable way simulation time.

> 2. The O3 CPU is checking renaming mode *way* more often than it really
> needs to. Almost every single squash is *not* a switch to/from 64 bit mode,
> but *every* switch involves that check, even in ISAs that don't even *have*
> rename modes.
> 3. The rename semantics switch can be handled right in the fault object when
> it implements the faulting context switch. It can detect that a switch is
> necessary and enact it without all the extra checks.

Totally agree on point 2. About point 3, yes you could handle it in the fault 
object and in the ERET instruction.
That would mean leaking uarch code in the arch directory. In other words, 
having some *O3 specific* code in
The arch directory. This is not ideal IMHO as it is bounding the arch code to a 
single cpu model

> 4. ARM can implement SVE, etc, using two different register files, one which
> is indexed by element for 32 bit mode, and one which is indexed by vector
> for 64 bit mode. The mode switch can copy values between the register files,
> and we can remove what I suspect is a lot of machinery from O3 by just
> letting it manage two different register files simply, instead of managing one
> with two different personalities. This also makes the register files much more
> homogenous and easier to generalize. A "real" CPU may not want to waste
> transistors, buses, etc, for two separate register files, but in the end it
> doesn't matter if the behavior is the same. This is all just in how O3 does 
> its
> bookkeeping, and a redundant register file is nearly free for gem5.
>

I would love to see a cleaner implementation! But I am not entirely sure your 
solution is much different from what we are having now:
Sure there is only one storage [1] but all remaining data structures are 
duplicated (check veRegIds and vecElemIds as an example, or the vecElem/vecReg 
freeLists [2]).
In fact, we are already copying values from one register file to the other when 
switching from Rename::Full to Rename::Elem [3].
I honestly believe having two different regfiles is the source of all our 
problems as it is forcing us to switch/copy values when a
Change in rename happens. What the implementation should have been like, is one 
single set of vector data structures with 2 different views.
No synchronization needed; AArch32 use the Enum view and AArch64 use the Full 
view.

> Please let me know if this is correct, and I'll start chopping away. Some way 
> to
> test my changes would be very helpful, since otherwise I'll just be hoping for
> the best :-P.

I would recommend you to cross-compile a FP&SIMD application for AArch32 and 
execute it on a AArch64 Linux kernel (with syscalls to make sure
we change rename mode and we don't rely on the intervention of the scheduler). 
You could even cross-compile the same source for AArch64 and
execute it as a separate process, and OFC to multiplex them on the same CPU.

>
> Gabe

Kind Regards

Giacomo

[1]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/regfile.hh#L86
[2]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/free_list.hh#L142
[3]: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/rename_map.cc#L211
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to