Thanks Gabe,

Please let me know if you can reproduce the problem with the mentioned patch. 
If that’s the case, we should open a JIRA ticket so that I can have a look at 
that

Giacomo

From: Gabe Black <gabebl...@google.com>
Sent: 25 June 2020 05:28
To: Giacomo Travaglini <giacomo.travagl...@arm.com>
Cc: gem5 Developer List <gem5-dev@gem5.org>; Weiping Liao 
<weipingl...@google.com>; Earl Ou <shunhsin...@google.com>; Yu-hsin Wang 
<yuhsi...@google.com>
Subject: Re: [gem5-dev] bug squashing renamed pinned registers in o3?

Hi Giacomo, thanks for your reply. To answer your questions, it looks like no 
for 1 (unless my grep was bad), and checkpoint save/restore for 2. I think 
we've been able to reproduce this problem much more easily with older versions 
of gem5, likely missing that fix, although I think we may have also seen it 
with newer versions. I've only been looking at it recently and was looking for 
the easiest way to reproduce, so I've only directly tried the older version.

I've cc-ed some Google folks that can hopefully share more details and confirm 
if they've seen this problem on a branch which does have the CL you mentioned 
in 1.

To Google folks, we should cherry-pick that CL into our branch to at least make 
the problem less common. We should have it already in our rebase branch, since 
it looks like it went in upstream in early March.

Gabe

On Wed, Jun 24, 2020 at 2:18 AM Giacomo Travaglini 
<giacomo.travagl...@arm.com<mailto:giacomo.travagl...@arm.com>> wrote:
Hi Gabe,

We are encountering the same problem on top of develop but it’s still worth 
asking:


  1.  Do you have https://gem5-review.googlesource.com/c/public/gem5/+/25743 ?
  2.  Are you encountering this in a simulation which is using a CPU switch or 
checkpoint save/restore

Kind Regards

Giacomo

From: Gabe Black via gem5-dev <gem5-dev@gem5.org<mailto:gem5-dev@gem5.org>>
Sent: 23 June 2020 06:24
To: gem5 Developer List <gem5-dev@gem5.org<mailto:gem5-dev@gem5.org>>
Cc: Gabe Black <gabebl...@google.com<mailto:gabebl...@google.com>>
Subject: [gem5-dev] bug squashing renamed pinned registers in o3?

Hi folks, specifically ARM folks. We've been seeing a problem with O3 where 
when switching vector register renaming modes (full vectors vs vector 
elements), the CPU checks its bookkeeping and finds that a vector register is 
missing, ie with no instructions in flight, the free list has one fewer 
register in it than the difference between the total number of physical vector 
registers, and the number that should be taken up with architectural state.

This problem has been somewhat difficult to reproduce, although we can get it 
to happen, and it does happen often enough that it's been a real pain for us. 
Given that it's not very easy to get it to happen which makes it hard to 
observe, I've been digging around in the code trying to understand what all the 
pieces do and why the bookkeeping might be wrong.

The most promising thing I've found so far is that when squashing, the rename 
stage looks at its history and rolls back renames for squashed instructions. 
Some registers are fixed and not renamed, so rolling back those would be 
pointless. Also those registers should not go on the free list.

The way O3 detects those special registers is that they have the same index 
before and after renaming. If that is the case, O3 ignores those entries, and 
does not roll them back or mark their target as free.

This check is slightly out of date though, since with the recently added pinned 
register writes, a register will be renamed to the same thing several times in 
a row. When these entries are checked, they will not be rolled back (I think 
this part is still fine), but they will also not be marked as free.

This isn't exactly a smoking gun though, since the more I think about it, the 
more I think this may actually be ok. If one of the later writes is squashed, 
the register isn't "free" since it still holds the (partially written) 
architectural state. If everything gets squashed all the way back to the first 
entry which did change what register to use, then the slightly outdated check 
won't trigger and things should be freed up correctly (I think).

This code is mostly new to me though, so I'm not super confident making any 
grand declarations about what's going on. All the pieces seem to be there 
though, which makes me very suspicious.

Maybe something goes wrong if the right number of writes never happens because 
later writers get squashed?

Gabe
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to