On Wed, Jun 6, 2012 at 4:43 PM, Roland Mainz <[email protected]> wrote:
> On Wed, Jun 6, 2012 at 3:53 PM, David Korn <[email protected]> wrote:
>> cc: [email protected]
>> Subject: Re: Random coredumps in |path_addcomp()| when testing
>> ast-open.2012-05-31 on Solaris 11/AMD64/64bit ...
>> --------
>>
>> Do you know which test this shows up on?
>
> Currently these tests _randomly_ (in non-reproduceable ways) fail
> within |path_addcomp()|:
> -- snip --
> attributes.sh
> comvar.sh
> sun_solaris_compoundvario.sh
> -- snip --
>
>> Do you know what PATH
>> you are runing with?
>
> Uhm... the default one set by "shtests" ?
This still happens in ast-ksh.2012-06-06.
I've narrowed it down a bit... it happened only for 64bit builds in
multibyte locales (e.g. $ LC_ALL=ja_JP.UTF-8 shtests --locale ... #)
and "zh_CN.GB18030" was more affected (based on statistics) than the
*.UTF-8-based locales... but it turns out that this is something which
only contributes to crash by shifting the layout of mapped code pages
a bit but is not directly connected to the issue.
After some digging I found this:
-- snip --
program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7ffbc0710b: memcmp+0x007b: movq 0x0000000000000018(%rsi),%r10
(dbx) where
=>[1] memcmp(0xfffffd7ffb7a988a, 0xfffffd7ffb7c5fe8, 0x5d, 0x2,
0xfffffd7ffb7ae000, 0x1), at 0xfffffd7ffbc0710b
[2] path_addcomp(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb994f2d
[3] path_addpath(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb9955ff
[4] path_init(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb9925e4
[5] path_get(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb9926ae
[6] sh_subshell(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb99b9c5
[7] comsubst(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb97f71b
[8] varsub(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb97ec4f
[9] copyto(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb979ab3
[10] sh_mactrim(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb9781ba
[11] nv_setlist(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb981930
[12] sh_exec(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb9a3863
[13] exfile(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb94acca
[14] sh_main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb949f4d
[15] main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x40a12a
(dbx) print (char*)0xfffffd7ffb7a988a
dbx: warning: unknown language, 'c' assumed
(char *) 18446741324854630538U = 0xfffffd7ffb7a988a
"/home/test001/ksh93/ast_ksh_20120606/build_i386_64bit_opt_extrabuiltins/./src/cmd/ksh93/tests"
(dbx) print (char *)0xfffffd7ffb7c5fe8
(char *) 18446741324854747112U = 0xfffffd7ffb7c5fe8 "/bin"
-- snip 0
The code uses |memcmp()| to compare two strings but uses the string
length (0x5d==93 bytes) of the first argument
("/home/test001/ksh93/ast_ksh_20120606/build_i386_64bit_opt_extrabuiltins")
... but the 2nd argument is the string "/bin" with only four bytes
(plus '\0') ... if this sits near the end of a MMU page and
|memcpy()| searches beyond that page into something which is not
mapped or declared a "red zone page" (e.g. these are pages which are
used directly before and after |mmap()| areas to trigger SIGSEGV for
reads) then the process gets wallopped with the big SIGSEGV stick.
A quick "hotfix" for the issue may be this patch...
-- snip --
diff -r -u build_i386_32bit_debug/src/cmd/ksh93/sh/path.c
build_i386_64bit_opt_extrabuiltins/src/cmd/ksh93/sh/path.c
--- build_i386_32bit_debug/src/cmd/ksh93/sh/path.c Tue Jun 5 19:18:06 2012
+++ build_i386_64bit_opt_extrabuiltins/src/cmd/ksh93/sh/path.c Sat
Jun 9 10:56:34 2012
@@ -1470,7 +1470,10 @@
len = strlen(name);
for(pp=first; pp; pp=pp->next)
{
- if(memcmp(name,pp->name,len)==0 && (pp->name[len]==':'
|| pp->name[len]==0))
+#ifndef MIN
+#define MIN(a,b) (((a) <= (b)) ? (a) : (b))
+#endif
+
if(memcmp(name,pp->name,MIN(len,(strlen(pp->name)+1)))==0 &&
(pp->name[len]==':' || pp->name[len]==0))
{
pp->flags |= flag;
return(first);
-- snip --
... but it is IMO not the solution because |(pp->name[len]==':' ||
pp->name[len]==0)| may (I say "may" ... I haven't really tried to
imagine all possible scenarious) still access memory outside the
allocated range.
IMO two things are needed:
1. A "better" algorithm
2. A testcase which feeds path lenghs near |PATH_MAX| for ext*fs
(beware... there's NFS which may be from filesystems with lower
limits) to this code for testing (Linux allows up to 4095 bytes...
which is good since it is (on x86) near the default page size of 4k...
likely triggering crashes quickly if something is wrong... on Solaris
|PATH_MAX| is 1024 which is less optimal for triggering the crash but
still such long PATHs still may trigger something useful).
----
Bye,
Rolsand
--
__ . . __
(o.\ \/ /.o) [email protected]
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers