On Fri 21 Sep 2018 at 16:57, Andrew Fish <[email protected]> wrote:

>
>
> > On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov <
> [email protected]> wrote:
> >
> > On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
> > <[email protected]> wrote:
> >>
> >> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <[email protected]> wrote:
> >>>
> >>> Of all the gin joints in all the towns in all the world, Vladimir
> >>> Olovyannikov
> >>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
> >>>
> >>>>> From: Ard Biesheuvel [mailto:[email protected]]
> >>>>> Sent: Wednesday, September 19, 2018 4:38 PM
> >>>>> To: Vladimir Olovyannikov
> >>>>> Cc: [email protected]
> >>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> >>>>> Armv8 platform
> >>>>>
> >>>>>
> >>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov
> >>>>>
> >>>>> <[email protected]> wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I need UEFI experts help on the problem with Armv8 board on warm
> UEFI
> >>>>>> reset.
> >>>>>> Cold reset works fine.
> >>>>>>
> >>>>>> Here is how I set up a warm reset:
> >>>>>>
> >>>>>> STATIC
> >>>>>> EFI_STATUS
> >>>>>> ShutdownUefiBootServices (
> >>>>>>
> >>>>>> VOID
> >>>>>> )
> >>>>>>
> >>>>>> {
> >>>>>>
> >>>>>> EFI_STATUS              Status;
> >>>>>> UINTN                   MemoryMapSize;
> >>>>>> EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> >>>>>> UINTN                   MapKey;
> >>>>>> UINTN                   DescriptorSize;
> >>>>>> UINT32                  DescriptorVersion;
> >>>>>> UINTN                   Pages;
> >>>>>>
> >>>>>> MemoryMap = NULL;
> >>>>>> MemoryMapSize = 0;
> >>>>>> Pages = 0;
> >>>>>>
> >>>>>> do {
> >>>>>>
> >>>>>>   Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                   &MemoryMapSize,
> >>>>>>                   MemoryMap,
> >>>>>>                   &MapKey,
> >>>>>>                   &DescriptorSize,
> >>>>>>                   &DescriptorVersion
> >>>>>>                   );
> >>>>>>
> >>>>>>   if (Status == EFI_BUFFER_TOO_SMALL) {
> >>>>>>
> >>>>>>     Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> >>>>>>     MemoryMap = AllocatePages (Pages);
> >>>>>>
> >>>>>>     //
> >>>>>>     // Get System MemoryMap
> >>>>>>     //
> >>>>>>     Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                     &MemoryMapSize,
> >>>>>>                     MemoryMap,
> >>>>>>                     &MapKey,
> >>>>>>                     &DescriptorSize,
> >>>>>>                     &DescriptorVersion
> >>>>>>                     );
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>>   // Don't do anything between the GetMemoryMap() and
> >>>>>>   ExitBootServices() if (!EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>     Status = gBS->ExitBootServices (gImageHandle, MapKey);
> >>>>>>     if (EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>       FreePages (MemoryMap, Pages);
> >>>>>>       MemoryMap = NULL;
> >>>>>>       MemoryMapSize = 0;
> >>>>>>
> >>>>>>     }
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>> } while (EFI_ERROR(Status));
> >>>>>>
> >>>>>> return Status;
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> Then perform
> >>>>>> ArmCleanDataCache ();
> >>>>>> ArmInvalidateDataCache ();
> >>>>>> ArmDisableInstructionCache ();
> >>>>>> ArmInvalidateInstructionCache ();
> >>>>>
> >>>>> These don't do anything useful on ARM. You can only reliably perform
> >>>>> cache
> >>>>> maintenance by virtual address.
> >>>>
> >>>> So, should I just remove them altogether?
> >>>>
> >>>>>> ArmDisableMmu ();
> >>>>>
> >>>>> ... so after this call returns, all bets are off with regards to
> >>>>> whether
> >>>>> what is popped from the stack is actually what we pushed when we
> >>>>> entered
> >>>>> the function.
> >>>>
> >>>> OK, thank you for explanation.
> >>>> But this call returns back into ResetLib implementation as it should,
> >>>> and
> >>>> then there is a direct jump to the start of FV.
> >>>> Am I doing anything wrong here?
> >>>> Then, up to the point of enabling of MMU the stack is OK. But right
> >>>> after
> >>>> enabling MMU it points at _ModuleEntryPoint end of function in
> >>>> DxeCoreEntryPoint.c
> >>>> Am I missing anything? Maybe some stack cleanup before jumping to the
> >>>> start
> >>>> of FV?
> >>>
> >>> When the MMU is enabled, does the mapping for the stack pages change?
> That
> >>> is,
> >>> could the stack now be mapped to different physical page now?
> >> Thanks for ideas Bill,
> >> No, the mapping stays the same.
> >> The issue is only with warm reset, and only on an A72 board.
> >> There is another platform on A53 sharing the same code, which has no
> issues
> >> with warm reset.
> >> I cannot explain why.
> >>>
> >>> Instead of showing a stack trace, can you dump the stack pages and
> compare
> >>> the
> >>> before and after contents?
> >> I can clearly see that before and after contents are different.
> >>>
> >>> Assuming the same physical memory pages are still being used, then
> there
> >>> could
> >>> be a cache flushing problem. What could happen is:
> >>>
> >>> - some stack memory has been touched recently and is now in the data
> cache
> >>> - changes are made, which are written to the cache, but not yet flushed
> >>> out to
> >>> RAM
> >>> - enabling the MMU causes a full invalidate of the cache
> >>>
> >>> Now when you look at the stack, you see the earlier contents that were
> in
> >>> RAM
> >>> -- the changes previously only written to the cache have been lost.
> >>>
> >>> Enabling/disabling caches and MMU is always tricky. I can't say for
> sure,
> >>> but
> >>> I wouldn't be surprised if there's some subtle bug that causes a flush
> >>> operation to be missed and things may just work by coincidence in the
> cold
> >>> start case.
> >> I might be missing something preparing for warm reset.
> >> Disabling interrupts does not help though.
> >> Ard, I switched off all DMA-capable devices, so am just booting into
> UEFI
> >> with no disks or network,
> >> disabled interrupts. The issue is here. Any ideas on how to debug it and
> >> fix?
> > Update: when I add this as an experiment:
> > UINTN StackBottom;
> >
> > __asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
> > WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
> >    PcdGet64(PcdSystemMemorySize) +
> >    PcdGet64(PcdSystemMemoryBase) - StackBottom);
> > then the stack data were not corrupted anymore on the next
> ArmEnableMmu() call,
> > and warm reset worked (though unreliably, can throw exception on
> > memcpy down the road on UEFI boot; probably because I invalidated only
> > the stack area in the experiment).
> > Considering that A53 board does not have issues, does this means that
> > ArmInvalidateDataCache() implementation is useless for A72?
> > Based on this, should the approach be "find all data regions and
> > invalidate them using InvalidateDataCacheRange()"?
>
> Vladimir,
>
> We hit an issue like this a while back on x86 and it turned out our
> sequence was dependent on the C compiler code generation not touching a
> specific register. It might be worth while to disassemble this code and
> take a look at the assembler. I'd also point out the __asm__ volatile is a
> serializing event to the C memory model so it could change how the C code
> was optimized. You could try  __asm__ volatile with a no-op instruction to
> see if it really is the SP read at this point that fixes the issue. But it
> is likely the bug will be more obvious if you look at the assembly.
>

Is your primary FV hosted in DRAM? Are you sure it has not been corrupted
by the time you attempt your warm reboot?

>



>
> >>>
> >>> -Bill
> >>>
> >>>>>> Then jump to start of FV:
> >>>>>>
> >>>>>> typedef
> >>>>>> VOID
> >>>>>>
> >>>>>> (EFIAPI *START_FV)(
> >>>>>>
> >>>>>> VOID
> >>>>>>
> >>>>>> );
> >>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> >>>>>> StartOfFv ();
> >>>>>>
> >>>>>> Now this is what happens on warm reset:
> >>>>>> reset -c warm
> >>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected.
> >>>>>>
> >>>>>>   Here is the stack right before ArmEnableMmu() is called:
> >>>>>>    ArmConfigureMmu+0x4f8
> >>>>>>    InitMmu+0x24
> >>>>>>    MemoryPeim+0x440
> >>>>>>    PrePiMain+0x114
> >>>>>>    PrimaryMain+0x68
> >>>>>>    CEntryPoint+0xC4
> >>>>>>    EL2:0x00000000800008BC
> >>>>>>    -----  End of stack info -----
> >>>>>>
> >>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> >>>>>>   ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> >>>>>>
> >>>>>> ArmConfigureMmu() in
> ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> >>>>>> (return EFI_SUCCESS)
> >>>>>>
> >>>>>>  _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> >>>>>>
> >>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
> >>>>>> 59-60.
> >>>>>>
> >>>>>>  El2:0x000000008E5E8300 <-- Absolutely bogus
> >>>>>>
> >>>>>>   --- End of stack info ---
> >>>>>>
> >>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to
> >>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> >>>>>>
> >>>>>> Would be grateful for any advice.
> >>>>>>
> >>>>>> Thank you,
> >>>>>> Vladimir
> >>>>
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> [email protected]
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>> --
> >>>
> =============================================================================
> >>> -Bill Paul            (510) 749-2329 | Senior Member of Technical
> Staff,
> >>>                 [email protected] | Master of Unix-Fu - Wind River
> >>> Systems
> >>>
> =============================================================================
> >>>   "I put a dollar in a change machine. Nothing changed." - George
> Carlin
> >>>
> =============================================================================
> > _______________________________________________
> > edk2-devel mailing list
> > [email protected]
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
>
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to