A defunct process is a process that has terminated but whose parent process
has not called wait() or one of its variants. I don't know why lsof still
reports open files. It shouldn't since a dead process should have its
resources, such as its file descriptor table, freed by the kernel even if
the parent hasn't called wait(). You didn't tell us the details of the OS
you're using so I would simply assume it's a quirk of your OS. It might be
more productive to look into why your program is panicing at
map_faststr.go:275. A likely explanation is you have a race in your program
that is causing it to attempt to mutate a map concurrently or you're trying
to insert into a nil map.

On Thu, Sep 10, 2020 at 4:43 PM Uday Kiran Jonnala <judayki...@gmail.com>
wrote:

> Hi Ian,
>
> Again. Thanks for the reply. Problem here is we see go process is in
> defunt process and sure parent process did not get SIGCHILD and looking
> deeper,
> I see a thread in  futex_wait_queue_me. If we think we are just getting
> the stack trace and the go process actually got killed, why would I see
> associated fd's in file table and fd table is still intact (see lsof
> information)
>
> Process which is in defunt state which got panic is <87548>, checking for
> threads in this which is 87548
>
> bash-4.2# cat /proc/*87548*/status
>  Name: replicator
>  State: Z (zombie)
>
> bash-4.2# ls -Fl /proc/*87548*/task/*87561*/fd | grep 606649
> l-wx------. 1 root root 64 Aug 25 10:59 1 -> pipe:[606649]
> l-wx------. 1 root root 64 Aug 25 10:59 2 -> pipe:[606649]
>
> Listing the threads
>
> bash-4.2# ps -aefT | grep 87548
> root 87548 87548 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
> root 87548 87561 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
> root 112448 112448 42566 0 17:13 pts/0 00:00:00 grep 87548
>
> bash-4.2# lsof | grep 606649
> replicato  87548  87561    root    1w     FIFO               0,11
> 0t0     606649 pipe
> replicato  87548  87561    root    2w     FIFO               0,11
> 0t0     606649 pipe
>
> Why does lsof show the entry for the FIFO file of this process?
>
> So I feel we have a scenario the thread which is sleeping on
> futex_wait_queue_me is not cleanup during panic() and causing the main
> thread to be exited leaving detached thread which waiting in
> futex_wait_queue_me is still present.
>
> The main issue is I am not able to reproduce this, since this go process
> is very big.
>
> Any way to verify this OR  take it further.
>
> Thanks & Regards,
> Uday Kiran
> On Monday, September 7, 2020 at 12:05:05 PM UTC-7 Ian Lance Taylor wrote:
>
>> On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala <juday...@gmail.com>
>> wrote:
>> >
>> > Thanks for the reply, I get the point on zombie, I do not think the
>> issue here is parent not reaping child, seems like go process has not
>> finished execution of some
>> > internal threads (waiting on some futex) and causing SIGCHILD not to be
>> sent to parent.
>> >
>> > go process named <replicator> hit with panic and I see this went into
>> zombie state
>> >
>> > $ ps -ef | grep replicator
>> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
>> >
>> > Now looking at the tasks within the process
>> >
>> > I see the stack trace of the threads within the process still stuck on
>> following
>> >
>> > bash-4.2# cat /proc/87548/task/87561/stack
>> > [<ffffffffbb114714>] futex_wait_queue_me+0xc4/0x120
>> > [<ffffffffbb11520a>] futex_wait+0x10a/0x250
>> > [<ffffffffbb1182ce>] do_futex+0x35e/0x5b0
>> > [<ffffffffbb11865b>] SyS_futex+0x13b/0x180
>> > [<ffffffffbb003c09>] do_syscall_64+0x79/0x1b0
>> > [<ffffffffbba00081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> > [<ffffffffffffffff>] 0xffffffffffffffff
>> >
>> > From the above example if we are creating some internal threads and
>> main thread is excited due to panic and left some detached threads, process
>> will be in zombie state until the threads
>> > within the process completes.
>> >
>> > It appears there is some run away threads hung state scenario causing
>> this. I am not able to reproduce it with main go routine explict panic and
>> some go routine still executing.
>> >
>> > Does the above stack trace sound familiar wrt internal threads of Go
>> runtime ?
>>
>> If the process is defunct, then none of the thread stacks matter.
>> They are just where the thread happened to be when the process exited.
>>
>> What is the real problem you are seeing?
>>
>> Ian
>>
>>
>>
>>
>> > On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor
>> wrote:
>> >>
>> >> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
>> >> <juday...@gmail.com> wrote:
>> >> >
>> >> > I have a situation on zombie parent scenario with golang
>> >> >
>> >> > A process (in the case replicator) has many goroutines internally
>> >> >
>> >> > We hit into panic() and I see the replicator process is in Zombie
>> state
>> >> >
>> >> > <<>>>:~$ ps -ef | grep replicator
>> >> >
>> >> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
>> >> >
>> >> >
>> >> >
>> >> > Main go routine (or the supporting P) excited, but panic left the
>> other P thread to be still in executing state (main P could be 87548 and
>> supporting P thread 87561 is still there) in blocked state
>> >> >
>> >> > bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx------.
>> 1 root root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx------. 1 root root 64
>> Aug 25 10:59 2 -> pipe:[606649]
>> >> >
>> >> > Stack trace
>> >> >
>> >> > bash-4.2# cat /proc/87548/task/87561/stack[<ffffffffbb114714>]
>> futex_wait_queue_me+0xc4/0x120[<ffffffffbb11520a>]
>> futex_wait+0x10a/0x250[<ffffffffbb1182ce>]
>> do_futex+0x35e/0x5b0[<ffffffffbb11865b>]
>> SyS_futex+0x13b/0x180[<ffffffffbb003c09>]
>> do_syscall_64+0x79/0x1b0[<ffffffffbba00081>]
>> entry_SYSCALL_64_after_hwframe+0x3d/0xa2[<ffffffffffffffff>]
>> 0xffffffffffffffff
>> >> >
>> >> >
>> >> >
>> >> > We have panic internally from main go routine
>> >> >
>> >> > fatal error: concurrent map writes
>> >> >
>> >> > goroutine 666359 [running]:
>> >> > runtime.throw(0x101d6ae, 0x15)
>> >> >
>> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>> +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
>> >> > runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990,
>> 0x83, 0xc0009d03c8)
>> >> >
>> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
>> +0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
>> >> >
>> github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
>> 0x83, 0x0)
>> >> >
>> >> > .......
>> >> >
>> >> > goroutine 665516 [chan receive, 2 minutes]:
>> >> > zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0,
>> 0xc002e906c0, 0x52, 0xc00302ec60, 0x29)
>> >> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
>> >> > created by zeus.(*Leadership).LeaderValue
>> >> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
>> >> > 2020-08-03 00:35:04 rolled over log file
>> >> > ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123
>> dataset.go:26] initialize zfs linking
>> >> > ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123
>> dataset.go:34] completed zfs linking successfully
>> >> > I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid:
>> c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
>> >> > I0803 00:35:04.433460 196123 main.go:99] Component name using for
>> this process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
>> >> > I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
>> >> >
>> >> > If there is panic() from main P thread, as I understand we exit()
>> and cleanup all P threads of the process.
>> >> >
>> >> > Are we hitting into the following scenario, I did not look into
>> M-P-G implantation in detail.
>> >> >
>> >> > Example:
>> >> >
>> >> > #include <stdio.h>
>> >> > #include <pthread.h>
>> >> > #include <unistd.h>
>> >> > #include <stdlib.h>
>> >> >
>> >> > void *thread_function(void *args)
>> >> > {
>> >> > printf("The is new thread! Sleep 20 seconds...\n");
>> >> > sleep(100);
>> >> > printf("Exit from thread\n");
>> >> > pthread_exit(0);
>> >> > }
>> >> >
>> >> > int main(int argc, char **argv)
>> >> > {
>> >> > pthread_t thrd;
>> >> > pthread_attr_t attr;
>> >> > int res = 0;
>> >> > res = pthread_attr_init(&attr);
>> >> > res = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
>> >> > res = pthread_create(&thrd, &attr, thread_function, NULL);
>> >> > res = pthread_attr_destroy(&attr);
>> >> > printf("Main thread. Sleep 5 seconds\n");
>> >> > sleep(5);
>> >> > printf("Exit from main process\n");
>> >> > pthread_exit(0);
>> >> > }
>> >> >
>> >> > kkk@ ~/mycode/go () $ ./a.out &
>> >> > [1] 108418Main thread. Sleep 5 secondsThe is new thread! Sleep 20
>> seconds...
>> >> > kkk@ ~/mycode/go () $
>> >> > Exit from main processs
>> >> > PID TTY TIME CMD
>> >> > 49313 pts/26 00:00:01 bash108418 pts/26 00:00:00 [a.out]
>> <defunct>108449 pts/26 00:00:00 ps
>> >> >
>> >> > See the main process is <defunct> and child is still hanging around
>> >> >
>> >> > kkk@ ~/mycode/go () $ sudo cat
>> /proc/108418/task/108420/stack[<ffffffff810b4c1d>]
>> hrtimer_nanosleep+0xbd/0x1d0[<ffffffff810b4dae>]
>> SyS_nanosleep+0x7e/0x90[<ffffffff816a63c9>]
>> system_call_fastpath+0x16/0x1b[<ffffffffffffffff>]
>> 0xffffffffffffffffujonnala@ ~/mycode/go () $ Exit from thread
>> >> >
>> >> > Any help in this regard is appreciated.
>> >>
>> >>
>> >> I think you are misreading something somewhere. Zombie status is a
>> >> feature of a process, not a thread. It means that the child process
>> >> has exited but that the parent process, the one which started the
>> >> child process via the fork system call (or, on GNU/Linux, the clone
>> >> system call), has not called the wait (or waitpid or wait3 or wait4)
>> >> system call to collect its status.
>> >>
>> >> So don't look at threads or P's. Look at the parent process that
>> >> started the process that became a zombie.
>> >>
>> >> Ian
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "golang-nuts" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to golang-nuts...@googlegroups.com.
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/f70e42f4-622d-4d91-b51d-ed00f2e11ac4n%40googlegroups.com.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/f1c6abc0-13b2-41ca-a365-fe0fbc7f129an%40googlegroups.com
> <https://groups.google.com/d/msgid/golang-nuts/f1c6abc0-13b2-41ca-a365-fe0fbc7f129an%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD_Peg%2BMtJHGOwrqUKS%3D4JhPJgTS4WCMxocJWmX9J52VKg%40mail.gmail.com.

Reply via email to