I think I've run into a rather odd bug on a big-endian MIPS platform while
trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng
chroot.  In my netboot, I need to include xfsprogs, but this has a dependency
on the 'valloc' function call.  So in uclibc-ng, I enabled
CONFIG_UCLIBC_SUSV2_LEGACY to enable that function, and rebuilt uclibc-ng.
This fixes the xfsprogs build, but it very subtly breaks busybox's ash shell.

After rebuilding uclibc-ng, then rebuilding busybox statically/multicall, if
you run /bin/ash with a malformed argument or give it a script to execute that
doesn't have the execute bit set, you get a SIGSEGV:

Fudging up the argument syntax to /bin/ash:
octane / # /bin/ash "-c"
/bin/ash: -c requires an argument
Segmentation fault

Via a non-executable script "x.sh", we start with this sample:
octane / # cat ./x.sh
echo "foo!"

If "x.sh" has the executable bit set, we're all good:
octane / # ls -l ./x.sh
-rwxr-xr-x 1 root root 24 Oct 12 01:57 ./x.sh
octane / # /bin/ash -c ./x.sh

But if we turn off the executable bit...
octane / # chmod -x ./x.sh
octane / # ls -l ./x.sh
-rw-r--r-- 1 root root 24 Oct 12 01:57 ./x.sh
octane / # /bin/ash -c ./x.sh
/bin/ash: ./x.sh: Permission denied
Segmentation fault

The only backtrace I can get out of it after rebuilding uclibc-ng and busybox
with debugging is this (generated via the fudged argument example):

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x00452278 in __GI__longjmp_unwind (env=0x7ffeed58, val=1) at
#2  0x004061e4 in __libc_longjmp (env=0x7ffeed58, val=1) at
#3  0x0050185c in raise_exception (e=1) at shell/ash.c:448
#4  0x00501f00 in ash_vmsg_and_raise (cond=1, msg=0x60294d
<bb_msg_requires_arg> "%s requires an argument", ap=0x7ffeece4) at 
#5  0x00501f4c in ash_msg_and_raise_error (msg=0x60294d <bb_msg_requires_arg>
"%s requires an argument") at shell/ash.c:1243
#6  0x0051a918 in procargs (argv=0x7ffeeff4) at shell/ash.c:13009
#7  0x0051afe4 in ash_main (argc=2, argv=0x7ffeeff4) at shell/ash.c:13158
#8  0x0047e320 in run_applet_no_and_exit (applet_no=9, argv=0x7ffeeff4) at
#9  0x0047e370 in run_applet_and_exit (name=0x7ffef130 "ash", argv=0x7ffeeff4)
at libbb/appletlib.c:781
#10 0x0047e484 in main (argc=2, argv=0x7ffeeff4) at libbb/appletlib.c:838

Line #30 in jmp-unwind.c leads me to a really old uclibc bug, #3919:

But further investigation reveals that the null_not_ptr() check introduced by
the patch in that bug is already present in uclibc-ng in the patched spots,
plus a few new locations.  So either I've run into a new area of the code that
needs a similar change, or I'm chasing the wrong rabbit down the wrong hole and
the bug lies elsewhere, e.g., in busybox (hinted at by the SIGSEGV only when
chmod -x on the script).

I am aware that CONFIG_UCLIBC_SUSV2_LEGACY introduces an ABI compatibility, but
it appears to remainder of the chroot userland used to build the netboot
operates normally.  Only /bin/ash seems to have an issue.

No idea if instead, I need to get xfsprogs off of using valloc (given XFS' age
as a filesystem, it might need this anyways).  I can pursue that avenue if
needed, but I think I've stumbled onto a really obscure bug here that still may
need looking into.

Anything else I can provide to help chase this down?

Joshua Kinard
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

