Hi Laszlo,

I have root cause this issue, the AP hangs in the procedure when PiSmmCpuDxeSmm 
driver start up trigged this issue.

When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set memory 
attribute.  In StartAllAps function, after call WakeUpAp to start Aps, it calls 
CheckAllAps to wait all Aps finished the task. In CheckAllAps function, it 
detect AP state to know whether the AP has finished its task. In old code, it 
check whether the AP state is CpuStateFinished to know whether AP has finished 
tasks. This state is only set by AP when it truly finished task. In new logic, 
CpuStateFinished been replace with CpuStateIdle. And CpuStateIdle is also the 
begin state of the AP. AP will change state from CpuStateIdle to CpuStateBusy 
when it start execute the procedure. And after it finished the procedure, it 
will change state back to CpuStateIdle.

So when the hang issue raised, AP state is not been changed to CpuStateBusy 
when BSP calls CheckAllAps to check whether the AP has finished its task. So 
the state for the AP still in CpuStateIdle, but BSP think AP has finished its 
task. In this case, BSP think all the Aps has finished their tasks and it 
continues boot. But some AP may wake up later and it failed to return from the 
procedure. In this case, the AP state keeps at CpuStateBusy. So later in 
ChangeApLoopCallback function, because this AP state still in CpuStateBusy, 
this AP will not trig the procedure. But BSP wait all APs to trig the 
procedure(BSP wait the Aps to reduce the mNumberToFinish value in procedure to 
continue boot) to continue the boot, so the hang occurred.

I think we should keep a middle state to let us know whether the AP truly 
finished its task. I will send  another serial patch for this issue. Please 
help to check the new patches.

Thanks,
Eric

> -----Original Message-----
> From: edk2-devel [mailto:[email protected]] On Behalf Of
> Laszlo Ersek
> Sent: Saturday, July 21, 2018 12:30 AM
> To: Dong, Eric <[email protected]>; [email protected]
> Cc: Ni, Ruiyu <[email protected]>
> Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant
> parameter.
> 
> On 07/20/18 08:53, Dong, Eric wrote:
> >> -----Original Message----- From: Laszlo Ersek
> >> [mailto:[email protected]]
> 
> >> Therefore, please upgrade the host to Fedora 26. In Fedora 26, QEMU
> >> 2.9 is shipped:
> >>
> >> https://koji.fedoraproject.org/koji/buildinfo?buildID=986762
> >>
> >> ... It's even better if you can upgrade to Fedora 27, as Fedora 27 is
> >> the oldest Fedora release still supported at this point. The
> >> following article describes the recommended upgrade method:
> >>
> >> https://fedoraproject.org/wiki/DNF_system_upgrade
> >>
> >
> > I updated the system to fedora 28, but it failed to boot. :(  so I
> > borrowed an exited fedora 27 DVD and installed it. With this OS, I can
> > reproduce this issue now. I found this issue is an random issue, I
> > booted 5 times and met the issue.  I'm checking the issue.
> 
> Awesome!
> 
> (I'm not happy about the problem itself, of course, but I'm *very* thankful
> that you took the time to install a Linux box, for testing with
> KVM!!!)
> 
> Laszlo
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to