Hi, Paweł,

Thanks for your help. I was just starting to think the same thing, the
code seemed to be trying to put all cpus into WFS. I tested your patch
on the 5.12 commit that broke and also the latest 5.18 git kernel and
indeed it works fine. On a quad core I can see 3 cores go into WFS and
then tboot continues take bfs to s3 state. Very good!

Derek

On 5/27/22 9:50 AM, Randzio, Pawel wrote:
> Hi Derek,
> 
> I wish to gladly inform you that I've fixed the bug preventing suspend with 
> Tboot.
> 
> It was in the piece of code that you've highlighted - the atomic increment on 
> ap_wfs_count variable need have been moved into the else clause of the 
> following if statement, something like this:
> 
> static int tboot_dying_cpu(unsigned int cpu) {
>       if (num_online_cpus() == 1) {
>               if (tboot_wait_for_aps(atomic_read(&ap_wfs_count)))
>                       return -EBUSY;
>       } else
>               atomic_inc(&ap_wfs_count);
>       return 0;
> }
> 
> Otherwise, when the procedure shutdown all APs, BSP was also counted in as 
> one of them, thus creating a mismatch between what kernel had counted and 
> what TBOOT reported back to it.
> This caused a kernel panic, which froze the machine while putting it into 
> suspend.
> 
> I've sent a version of TBOOT Live Image equipped with this patch of the 
> kernel to our validation teams and I'm awaiting their response after the 
> weekend.
> If I get positive results from their tests I will try and have this patch 
> attached to the official kernel repository and featured in versions 5.19 and 
> up.
> 
> Kind regards,
> Paweł
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Randzio, Pawel 
> Sent: Friday, May 13, 2022 12:12 PM
> To: 'Derek Dolney' <z...@posteo.net>; Łukasz Hawryłko <luk...@hawrylko.pl>
> Cc: tboot-devel@lists.sourceforge.net
> Subject: RE: [tboot-devel] suspend problem since kernel 5.15
> 
> Hi Derek,
> 
> I've also been picking at that issue for the past few months and reached the 
> same wall as you have with the -EBUSY callback return, although your message 
> kind of gives me a new idea for where to look for the root cause as it seems 
> I have not tracked all possible callbacks.
> I'm not a kernel developer either and honestly debugging that S3 issue is 
> troublesome to me too, to say the least. If anyone on this mailing list has 
> any idea how to further the debugging or even better - solve this issue, 
> please feel free to share ideas.
> 
> On a side note, please add Vincent into the communication, that might speed 
> up the process.
> Vincent may add others that could know what might be going on with that issue.
> 
> Thanks,
> Paweł
> 
> -----Original Message-----
> From: Derek Dolney <z...@posteo.net> 
> Sent: Thursday, May 12, 2022 6:20 PM
> To: Łukasz Hawryłko <luk...@hawrylko.pl>; tboot-devel@lists.sourceforge.net
> Subject: Re: [tboot-devel] suspend problem since kernel 5.15
> 
> I have been working on this as best I can. However, I confess that I am not a 
> kernel developer and have really no understanding of these tboot internals. 
> Nevertheless here is a brief update. Please anyone feel free to share any 
> ideas how to move forward to some resolution.
> 
> I got a desktop machine with rs232 serial output running tboot and reproduced 
> the suspend problem that way and with this setup I can collect kernel printk 
> and also cpu hotplug (cpuhp) tracing output. I have also thankfully got quite 
> a bit of help from Vincent Donnefort who wrote the cpuhp changes (the commit 
> I posted) that have exposed the issue. He has been very helpful, let me try 
> to tell you what we have figured out.
> 
> On suspend, I get into the tboot callback:
> 
> static int tboot_dying_cpu(unsigned int cpu) {
>       atomic_inc(&ap_wfs_count);
>       if (num_online_cpus() == 1) {
>               if (tboot_wait_for_aps(atomic_read(&ap_wfs_count)))
>                       return -EBUSY;
>       }
>       return 0;
> }
> 
> but the tboot_wait_for_aps times out for me so the callback returns EBUSY. 
> The problem with that happening is that there is not a rollback mechanism in 
> place at this point in the cpuhp sequence. So I mean from cpuhp point of 
> view, there is not even a mechanism to handle the tboot callback failure. 
> Besides that, we don't know what could be a sensible thing to do in the case 
> of EBUSY. What does it mean tboot is busy and what should be done about it? 
> Please help us to understand.
> 
> 
> _______________________________________________
> tboot-devel mailing list
> tboot-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tboot-devel
> ---------------------------------------------------------------------
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII 
> Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 
> 957-07-52-316 | Kapital zakladowy 200.000 PLN.
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
> moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
> wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
> jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). If you are not the intended recipient, 
> please contact the sender and delete all copies; any review or distribution 
> by others is strictly prohibited.
> 


_______________________________________________
tboot-devel mailing list
tboot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tboot-devel

Reply via email to