Re: Parsing a bus fault message?

tiejun.chen Wed, 29 Sep 2010 01:48:08 -0700

Scott Wood wrote:
> On Tue, 28 Sep 2010 08:31:54 -0700
> "Ira W. Snyder" <[email protected]> wrote:
> 
>> On Tue, Sep 28, 2010 at 09:26:51AM -0500, [email protected] wrote:
>>> Alternatively, can somebody see a hint in the message that I don't know
>>> enough to pick out? At this point, my code is trying to memcpy() from the
>>> PCIe bus (mapped via the outbound ATMU) to local memory, so the fault is
>>> either a) the ATMU is not accessible b) the ATMU is accessible but not
>>> mapped (which I would have thought the ioremap call I made would have
>>> handled) or c) the chip is not able to bus master on the PCI bus.
> 
> Check the LAWs, the outbound ATMU, and the PCI device's BAR.  Make sure


I also meet machine check exception if configure LAW improperly for PCI. (i.e.
unmatched PCIe controller id.)

>From you log looks 0xexxxxxxx should be your PCI space. So you can check if 
>that
 fall into appropriate LAW configuration. Maybe you can post your boot log and
error log here.

> the address goes where you're expecting at each level.
> 
>>> Machine check in kernel mode.
>>> Caused by (from SRR1=149030): Transfer error ack signal
>> ^^^ this is the line that contains some critical info
>>
>> In the 86xx CPU manual, you should be able to find information about the
>> SRR1 register. Decoding the hex SRR1=0x149030 may help.

Actually 'Transfer error ack signal' is the result just after kernel decode
SRR1/MSSSR0.

>>
>> The kernel is telling you this is a TEA (transfer error acknowledge)
>> error. I've only seen this when I get an unhandled timeout on the local
>> bus. For example, a FPGA that has died in the middle of a request.
> 

I met this only one time when kernel access USB host controller REGs on one
mpc837x. But the same kernel is fine on another same version target. So I think
sometimes you have to check the hardware.

> I've seen it when you access a physical address that has no device
> backing it up.
>

Yes. This should be most common reason for machine check exception when we
access one address with cache inhibited.

>> On the PCI bus, I haven't seen this error. The 83xx PCI controller is
>> smart enough to return 0xffffffff when reading a non-existent device.
> 
> I believe that behavior is configurable.

I know 0xfffffffff will be returned by some PCI controller when PCI controller
access non-existent device. Because PCI controller can't get any response from
that non-existed device. So PCI controller think this 'read' should be aborted
by asserting bus to one known state, 0xffffffff. But I have to admit I really am
not sure if this is configured. I prefer to this behavior should be associated
to the given PCI controller fixed feature.

Tiejun

> 
> -Scott
> 
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

_______________________________________________
Linuxppc-dev mailing list
[email protected]
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Parsing a bus fault message?

Reply via email to