On Sat, Jul 06, 2019 at 07:56:39PM +1000, Nicholas Piggin wrote:
Santosh Sivaraj's on July 6, 2019 7:26 am:
From: Reza Arbab <ar...@linux.ibm.com>

Testing my memcpy_mcsafe() work in progress with an injected UE, I get
an error like this immediately after the function returns:

BUG: Unable to handle kernel data access at 0x7fff84dec8f8
Faulting instruction address: 0xc0080000009c00b0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: mce(O+) vmx_crypto crc32c_vpmsum
CPU: 0 PID: 1375 Comm: modprobe Tainted: G           O      5.1.0-rc6 #267
NIP:  c0080000009c00b0 LR: c0080000009c00a8 CTR: c000000000095f90
REGS: c0000000ee197790 TRAP: 0300   Tainted: G           O       (5.1.0-rc6)
MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88002826  XER: 
00040000
CFAR: c000000000095f8c DAR: 00007fff84dec8f8 DSISR: 40000000 IRQMASK: 0
GPR00: 000000006c6c6568 c0000000ee197a20 c0080000009c8400 fffffffffffffff2
GPR04: c0080000009c02e0 0000000000000006 0000000000000000 c000000003c834c8
GPR08: 0080000000000000 776a6681b7fb5100 0000000000000000 c0080000009c01c8
GPR12: c000000000095f90 00007fff84debc00 000000004d071440 0000000000000000
GPR16: 0000000100000601 c0080000009e0000 c000000000c98dd8 c000000000c98d98
GPR20: c000000003bba970 c0080000009c04d0 c0080000009c0618 c0000000001e5820
GPR24: 0000000000000000 0000000000000100 0000000000000001 c000000003bba958
GPR28: c0080000009c02e8 c0080000009c0318 c0080000009c02e0 0000000000000000
NIP [c0080000009c00b0] cause_ue+0xa8/0xe8 [mce]
LR [c0080000009c00a8] cause_ue+0xa0/0xe8 [mce]

After debugging we see that the first instruction at vector 200 is skipped by
the simulator, due to which r13 is not saved. Adding a nop at 0x200 fixes the
issue.

(This commit is needed for testing this series. This should not be taken
into the tree)

Would be good if this was testable in simulator upstream, did you
report it? What does cause_ue do? exc_mce in mambo seems to do the
right thing AFAIKS.

I think I posted this earlier, but cause_ue() is just a test function telling me where to set up the error injection:

static noinline void cause_ue(void)
{
        static const char src[] = "hello";
        char dst[10];
        int rc;

        /* During the pause, break into mambo and run the following */
        pr_info("inject_mce_ue_on_addr 0x%px\n", src);
        pause(10);

        rc = memcpy_mcsafe(dst, src, sizeof(src));
        pr_info("memcpy_mcsafe() returns %d\n", rc);
        if (!rc)
                pr_info("dst=\"%s\"\n", dst);
}

Can't speak for the others, but I haven't chased this upstream. I didn't know it was a simulator issue.

--
Reza Arbab

Reply via email to