Hello, Kindly accept my apologies, there was a misunderstanding regarding the earlier patch, as we observed Cygwin tests failing in our arm64 environment, specifically related to pthreads. I request you to drop this patch. I will share a new patch that addresses the pthread test failures on AArch64.
Thanks, Thirumalai Nagalingam -----Original Message----- From: Jeremy Drake <cyg...@jdrake.com> Sent: 26 June 2025 00:08 To: Thirumalai Nagalingam <thirumalai.nagalin...@multicorewareinc.com> Cc: cygwin-patches@cygwin.com Subject: Re: [PATCH] Aarch64: Fix register load order in `ldp` in commit f4ba145 On Wed, 25 Jun 2025, Thirumalai Nagalingam wrote: > - ldp x0, x10, [x19, #16] // x0 = stackaddr, x10 = stackbase \n\ > + ldp x10, x0, [x19, #24] // x0 = stackaddr, x10 = stackbase \n\ I am very confused about this. The struct layout: struct pthread_wrapper_arg { LPTHREAD_START_ROUTINE func; // +0 PVOID arg; // +8 PBYTE stackaddr; // +16 PBYTE stackbase; // +24 PBYTE stacklimit; // +32 ULONG guardsize; // +40 }; below, you have ldp x19, x0, [x19] // x19 = func, x0 = arg \n\ blr x19 // call thread function \n" If this works (and it'd be really very obvious if it didn't), ldp loads 64-bits at the address given and puts it in the first register, and loads 64-bits at address+8 and puts it in the second register. So wouldn't this really be + ldp x10, x0, [x19, #24] // x10 = stackbase, x0 = stacklimit \n\ ? so now you're freeing stacklimit instead of stackbase? I don't think that's right.