Hi Efraim, Efraim Flashner <efr...@flashner.co.il> writes:
> On Sat, Aug 05, 2017 at 02:21:55AM -0400, Mark H Weaver wrote: >> Reviving a very old thread... >> >> l...@gnu.org (Ludovic Courtès) writes: >> >> > diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc >> > index cebc404d1..9b7bb5391 100644 >> > --- a/nix/libstore/build.cc >> > +++ b/nix/libstore/build.cc >> > @@ -26,6 +26,7 @@ >> > #include <errno.h> >> > #include <stdio.h> >> > #include <cstring> >> > +#include <stdint.h> >> > >> > #include <pwd.h> >> > #include <grp.h> >> > @@ -2008,7 +2009,11 @@ void DerivationGoal::startBuilder() >> > char stack[32 * 1024]; >> > int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | >> > SIGCHLD; >> > if (!fixedOutput) flags |= CLONE_NEWNET; >> > - pid = clone(childEntry, stack + sizeof(stack) - 8, flags, this); >> > + >> > + /* Ensure proper alignment on the stack. On aarch64, it has to be 16 >> > + bytes. */ >> > + pid = clone(childEntry, (char *)(((uintptr_t)stack + 16) & ~0xf), >> > + flags, this); >> > if (pid == -1) >> > throw SysError("cloning builder process"); >> > } else >> >> This patch, applied in February, contains a serious error. The stack >> address passed to 'clone' is supposed to be near the end of the memory >> block allocated for the stack, and that's how it was before this patch >> was applied. Since this patch was applied, it now passes an address >> very close to the *start* of the memory block. >> >> This broke the daemon on mips64el in a subtle way that was rather >> difficult to debug. After about six months of being too busy with other >> things to investigate properly, I finally tracked it down to this >> change. >> >> I reverted this commit. Let's try again to find a proper fix for this >> issue on aarch64. >> >> Thanks, >> Mark > > How about doubling the size of the stack to [32 * 1024 * 2] and Is there a need to double the size of the stack? If we have no reason to think so, I'd rather leave it alone. > changing the clone location to 'stack + sizeof(stack) - 16', does that > work for mips64el? The problem with (stack + sizeof(stack) - 16) is that there's no guarantee that 'stack' will be aligned on a 16-byte boundary. It might be that if we add another local variable somewhere else in this function, or if the compiler changes, we'll need to change the 16 to a different number to make it work. Can you try the following patch on aarch64 and report back? Thanks, Mark --8<---------------cut here---------------start------------->8--- diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc index 693fa70c8..c5cd4bdb2 100644 --- a/nix/libstore/build.cc +++ b/nix/libstore/build.cc @@ -26,6 +26,7 @@ #include <errno.h> #include <stdio.h> #include <cstring> +#include <stdint.h> #include <pwd.h> #include <grp.h> @@ -2008,11 +2009,11 @@ void DerivationGoal::startBuilder() char stack[32 * 1024]; int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | SIGCHLD; if (!fixedOutput) flags |= CLONE_NEWNET; -#ifdef __aarch64__ - pid = clone(childEntry, stack + sizeof(stack) - 16, flags, this); -#else - pid = clone(childEntry, stack + sizeof(stack) - 8, flags, this); -#endif + /* Ensure proper alignment on the stack. On aarch64, it has to be 16 + bytes. */ + pid = clone(childEntry, + (char *)(((uintptr_t)stack + sizeof(stack) - 8) & ~0xf), + flags, this); if (pid == -1) throw SysError("cloning builder process"); } else --8<---------------cut here---------------end--------------->8---