On Thu, Feb 2, 2017 at 3:25 PM, Ronny Meeus <[email protected]> wrote:
>>> When pressing enter in the ssh session I see for dropbear:
>>> # strace -p 2066
>>> strace: Process 2066 attached
>>> _newselect(8, [3 5 7], [], NULL, {3516, 826061}) = 1 (in [5], left
>>> {3512, 787390})
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
>>> read(5, ";\235\21\332\365\210T\200X}\230\"\306.\363\221", 16) = 16
>>> read(5,
>>> "\2\345\252\274\24Y\253\21\316>}\266\fU\20259\324\254Tu\3534\0238bMXzV\274\270",
>>> 32) = 32
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 955324187}) = 0
>>> writev(7, [{iov_base="\r", iov_len=1}], 1) = 1
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> _newselect(8, [3 5 7], [], NULL, {3600, 0}) = 1 (in [7], left {3599,
>>> 999987})
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> read(7, "\r\n", 16375) = 2
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 956324197}) = 0
>>> writev(5,
>>> [{iov_base="\231\310\271\315\354\243\342\271\22,\325Tj\n\356\345\"t\332d\205\317.\213\376\200\274h\201\347$\324"...,
>>> iov_len=48}], 1) = 48
>>> clock_gettime(0x6 /* CLOCK_??? */, {327, 957324207}) = 0
>>> _newselect(8, [3 5 7], [], NULL, {3600, 0}^Cstrace: Process 2066 detached
>>>
>>> While the sh process is not printing any additional traces. So this
>>> process is completely blocked:
>>> /isam/slot_default/run # strace -p 2078
>>> strace: Process 2078 attached
>>> futex(0xffed598, FUTEX_WAIT_PRIVATE, 2, NULL
>>>
>>>
>>> Connecting a debugger to the system (sh pid 2078) shows that the only
>>> thread the process has is blocked
>>> on a mutex in the C library.
>>>
>>> (gdb) info threads
>>> Id Target Id Frame
>>> * 1 Thread 2078 0x1003d0ec in putprompt (s=<optimized out>)
>>> at shell/ash.c:2455
>>> (gdb) bt
>>> #0 0x0ff5c708 in __lll_lock_wait_private (futex=0xffed598
>>> <main_arena>) at ../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:31
>>> #1 0x0fef07a8 in *__GI___libc_free (mem=<optimized out>) at malloc.c:3714
>>> #2 0x1003d0ec in putprompt (s=<optimized out>) at shell/ash.c:2455
>>> #3 setprompt_if (do_set=<optimized out>, whichprompt=<optimized out>)
>>> at shell/ash.c:2501
>>> #4 0x1003d448 in parsecmd (interact=<optimized out>) at shell/ash.c:12074
>>> #5 0x1004100c in cmdloop (top=<optimized out>) at shell/ash.c:12215
>>> #6 0x10042730 in ash_main (argc=<optimized out>, argv=<optimized
>>> out>) at shell/ash.c:13350
>>
>> Looks like signal interrupted malloc or free, then
>> signal handler longjmped (ash by design does that)
>> without returning to the malloc or free.
>> malloc state is now corrupted, and free()
>> in putprompt() deadlocks.
>>
>> INT_OFF/INT_ON pais guarding code which must not be
>> interrupted like this is missing somewhere.
>
> Interesting info, thanks.
>
> How do we continue to identify the place in the code?
I guess by code review and experiments. For example,
try adding "INT_OFF;" and "INT_ON;" around this
code block:
# if ENABLE_FEATURE_TAB_COMPLETION
line_input_state->path_lookup = pathval();
# endif
reinit_unicode_for_ash();
nr = read_line_input(line_input_state, cmdedit_prompt,
buf, IBUFSIZ, timeout);
> Does this not mean that before all library calls we need to make sure
> signals are disabled?
Not all library calls, only some. For example, read() or strlen()
can be interrupted and longjmp'ed away with no ill effects.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox