On 1 March 2016 at 08:50, Ryan Harkin <[email protected]> wrote:
> Thanks for jumping in!
>
> On 29 February 2016 at 20:59, Ard Biesheuvel <[email protected]> 
> wrote:
>> On 29 February 2016 at 21:05, Laszlo Ersek <[email protected]> wrote:
>>> On 02/29/16 20:39, Ryan Harkin wrote:
>>>> Hi Ard/Leif/anyone who cares,
>>>>
>>>> So I was trying to work out who broke MMC support in TC2 in the
>>>> upstream EDK2 tree.  It was difficult because the tree is borked on
>>>> TC2 in so many interesting ways throughout history, but I eventually
>>>> bisected down to this patch:
>>>>
>>>> 300fc77  2015-08-25  ArmPlatformPkg/PL180MciDxe: check PrimeCell ID
>>>> before initializing [Ard Biesheuvel]
>>>>
>>>> Basically, TC2 reads 0x02 for MCI_PERIPH_ID_REG3, when, according to
>>>> the spec, the register is supposed to read 0x00 in all cases.  So the
>>>> driver doesn't probe and is never initialised.  I guess this is an
>>>> FPGA bug in TC2?  It's probably known about, but not to me ;-)
>>>>
>>>> Anyway, how to fix it??
>>>>
>>>> We could mask off the "stuck" bit, we could not check ID_REG3, there
>>>> are other things we could do.
>>>>
>>>> I decided to mask off the bit rather than discard the register check
>>>> in my patch below, just to get things working
>>>>
>>>> But would you like to do?
>>>>
>>>> For extra point.... this was extra fun to track down due to other
>>>> problems.  TC2 stopped booting since this patch was submitted
>>>>
>>>> d340ef7  2014-08-26  ArmPkg/ArmArchTimerLib: Remove non required
>>>> [depex] and IoLib      [Olivier Martin]
>>>>
>>>> I've always carried a revert patch in my tree because I was previously
>>>> told I was wrong and that it wasn't a problem, even though it clearly
>>>> is.  TC2 is spewing out a constant stream of this message:
>>>>
>>>>     IRQ Exception PC at 0xBFB74C20  CPSR 0x60000133
>>>>
>
> ^^ That only happens on a release build.  I suspect there's a there
> bug too.  The IRQ exception shouldn't happen infinitely.  I think it
> should appear once and never again.
>

My suspicion is that this driver depends on the CPU arch protocol so
that interrupts are disabled at the GIC for everything except the
timer before it starts poking the MMC hardware.

> But anyway, on a debug build, I see this once I've fixed the probe function:
>
> Probing ID registers at 0x11C050FE0 for a PL180
>
> IRQ Exception PC at 0xBF71FC58  CPSR 0x60000133 nZCveAifT_svc
> /working/platforms/uefi/edk2/Build/ArmVExpress-CTA15-A7/DEBUG_GCC49/ARM/ArmPlatformPkg/Drivers/PL180MciDxe/PL180MciDxe/DEBUG/PL180MciDxe.dll
> loaded at 0xBF71E000 (PE/COFF offset) 0x1C58 (ELF or Mach-O offset) 0x1A38
> 0x9303       STR    r3, [sp, #0xC]
>   R0 0x00000001   R1 0x00000400   R2 0x00000800   R3 0x00000080
>   R4 0xBF810391   R5 0xBF831000   R6 0x00000000   R7 0xB000021C
>   R8 0x80000100   R9 0xB8000000  R10 0xBFFEC000  R11 0x00000000
>  R12 0x00000000   SP 0xBFFFFB40   LR 0xBF71FC39   PC 0xBF71FC58
> DFSR 0x00001E17 DFAR 0xE1D3C103 IFSR 0x00001236 IFAR 0xE9404847
>  No function: write to 0xE1D3C103
>  Instruction Access Flag fault on Page at 0xE9404847
>
> ASSERT [ArmCpuDxe]
> /working/platforms/uefi/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/Arm/DefaultExceptionHandler.c(258):
> ((BOOLEAN)(0==1))
>

Here, it hits the default exception handler because no handler has
been registered for whichever interrupt the MMC hardware raises. The
same happens in a RELEASE build, but there the output is less verbose
and the ASSERT() is compiled out, so execution proceeds.

>
>
>>>> It wasn't fixed until Ard's patch that broke MMC support.  Ugh!
>>>>
>>>> I'm suspecting that the MMC support has a dependency on IoLib - for
>>>> that is the part of the patch that broke TC2 in the first place.  But
>>>> I have yet to investigate that problem; I don't even know what IoLib
>>>> is.
>>>
>>> IoLib is a library class that lets you massage IO ports and MMIO registers.
>>>
>>> MdePkg/Include/Library/IoLib.h
>>>
>>> The patch you quoted does two things: it removes ArmArchTimerLib's
>>> build-time dependency on the IoLib class, and it removes the runtime
>>> (dispatch) dependency on EFI_CPU_ARCH_PROTOCOL of any module that is
>>> linked against ArmArchTimerLib (unless that module has the same
>>> dependency due to another library instance it links against, or due to
>>> its own explicit [depex] section).
>>>
>>> Removing the library class dependency could introduce such a problem
>>> only if the actual library instance used for that dependency had a
>>> constructor function that is henceforth no longer called, and this
>>> function changed something related to interrupts. Very unlikely.
>>>
>>> Removing the DXE dispatch dependency on EFI_CPU_ARCH_PROTOCOL is the
>>> likely culprit, in my opinion. The driver that provides said
>>> architectural protocol probably massages interrupt configuration on the
>>> CPU or the GIC in its entry point function in such a way that
>>> ArmArchTimerLib actually silently depends on, without explicitly calling
>>> EFI_CPU_ARCH_PROTOCOL member functions. By removing the depex, the DXE
>>> core may have reordered another driver (that links against
>>> ArmArchTimerLib) versus the driver providing EFI_CPU_ARCH_PROTOCOL --
>>> for which reason the timerlib functions may now run without the
>>> necessary interrupt setup.
>>>
>>> Instances of the TimerLib class have always been finicky. For example,
>>> in OvmfPkg we have three instances (for various module types & firmware
>>> phases). The two instances that get linked into early module types (SEC,
>>> and PEI_CORE, PEIM, DXE_CORE) massage chipset registers directly,
>>> because that was the only robust way to make sure that whichever of
>>> these module (types) needed the ACPI timer could actually utilize it.
>>> Through these library instances, every such "early" module (that needs
>>> TimerLib) looks at the chipset registers, and sets the needed bits if
>>> they are not in place yet.
>>>
>>
>> Thanks for the analysis
>>
>> It appears that PL180MciDxe's dependency on gEfiCpuArchProtocolGuid
>> was transitively fulfilled by its dependency on TimerLib, which is
>> implemented by ArmArchTimerLib on TC2, and the patch removes it from
>> the depex
>>
>
> It looks like both of you are right:  If I only revert the Depex part
> of the commit, all works.
>
> +[Depex]
> +  gEfiCpuArchProtocolGuid
>
> If I only revert the IoLib part, then the board still dies with the same 
> ASSERT.
>
>
>> Simply replacing TRUE with gEfiCpuArchProtocolGuid in the Depex
>> section of PL180MciDxe.inf should do the trick.
>>
>
> Yes, that works, thanks.  It looks like a better fix than adding it back into
>

ArmArchTimerLib.inf? I agree

>
>> As far as the Primecell ID is concerned, let's just whitelist whatever
>> TC2 exposes, even if in error.
>
> You mean rather than mask the "stuck bit", specifically check if the
> register is 0 or 2?  (Where 2 is added as a #define to the header).
>

I don't care deeply either way, but supporting the documented and the
actual ID explicitly seems more correct, unless the difference is in
the revision field?

> I intend to also add a patch that prints debug if the probed registers
> don't match, sound reasonable?

Sure, that makes sense. On Foundation model, I already see a
notification related to the probing of the (non-existent, in that
case) Primecell ID registers, but it makes sense to have the same in
UEFI's own DEBUG output.

-- 
Ard.
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to