On 5/17/23 7:19 AM, Michael Ellerman wrote:
Gaurav Batra<gba...@linux.vnet.ibm.com>  writes:
Hello Michael,

System test hit the crash. I believe, it was PHYP that resulted in it
due to number of TCEs passed in to be >512.
OK. It's always good to spell out in the change log whether it's a
theoretical/unlikely bug, or one that's actually been hit in testing or
the field.
I will submit another version of the patch with some changes in the log once I figure out how to Tag it for stable (as mentioned below).
I was wondering about the Fixes tag as well. But, this interface, in
it's current form, is there from the day the file was created. So, in
this case, should I mention the first commit which created this source file?
If it really goes back to the origin commit, then it's probably better
to just say so and tag it for stable, rather than pointing to 1da177e4.
How to do I tag it for stable? Will it be part of the "Fixes:" tag or some other tag?

I wonder though is there something else that changed that means this bug
is now being hit but wasn't before? Or maybe it's just that we are
testing on systems with large enough amounts of memory to hit this but
which aren't using a direct mapping?

From the details in Bugzilla, it does seems like the HCALL was previously taking long as well but PHYP was more relaxed about it. Now, PHYP is limiting on how long can an HCALL take.

Below are some excerpts from the Bug: 202349

Linux is passing too many counts in H_STUFF_TCE. The higher the counts, the longer the HCALL takes. From a Hypervisor perspective, we cannot stop Linux from doing this or it will violate the rules in the PAPR (which then would cause Linux to crash). The dispatcher team has "tightened the screws" on long running HCALLs by causing this trap to fire. From our discussions, they will not put the limits back where they were before.


Thanks

Gaurav


cheers

Reply via email to