Lin Wen created HAWQ-252:
----------------------------

             Summary: Coredump When RM Reconnect libyarn
                 Key: HAWQ-252
                 URL: https://issues.apache.org/jira/browse/HAWQ-252
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Resource Manager
            Reporter: Lin Wen
            Assignee: Lei Chang


Coredump When RM Reconnect libyarn
Missing separate debuginfos, use: debuginfo-install 
hawq-2.0.0.0_beta-19011.x86_64
(gdb) bt
#0  0x0000000000e661f8 in std::string::_Rep::_S_empty_rep_storage ()
#1  0x00007f7f1f20947c in libyarn::LibYarnClient::dummyAllocate (this=<value 
optimized out>)
    at 
/data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:330
#2  0x00007f7f1f209988 in libyarn::heartbeatFunc (args=<value optimized out>)
    at 
/data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:114
#3  0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0e8b6d in clone () from /lib64/libc.so.6
(gdb) info thread
  4 Thread 0x7f7efc239700 (LWP 760442)  0x000000350b40b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f7f1a1758c0 (LWP 760441)  0x000000350b0accdd in nanosleep () from 
/lib64/libc.so.6
  2 Thread 0x7f7efae37700 (LWP 760797)  0x000000350b0accdd in nanosleep () from 
/lib64/libc.so.6
* 1 Thread 0x7f7efb838700 (LWP 760443)  0x0000000000e661f8 in 
std::string::_Rep::_S_empty_rep_storage ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7efae37700 (LWP 760797))]#0  
0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
#1  0x000000350b0e1e54 in usleep () from /lib64/libc.so.6
#2  0x00007f7f1f209999 in libyarn::heartbeatFunc (args=<value optimized out>)
    at 
/data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:131
#3  0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0e8b6d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7f1a1758c0 (LWP 760441))]#0  
0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
#1  0x000000350b0e1e54 in usleep () from /lib64/libc.so.6
#2  0x00000000008dd8b9 in RB2YARN_registerYARNApplication () at 
resourcebroker_LIBYARN_proc.c:1354
#3  0x00000000008df8ad in RB2YARN_initializeConnection () at 
resourcebroker_LIBYARN_proc.c:1270
#4  0x00000000008dfc93 in ResBrokerMainInternal () at 
resourcebroker_LIBYARN_proc.c:202
#5  0x00000000008dff79 in ResBrokerMain () at resourcebroker_LIBYARN_proc.c:157
#6  0x00000000008dc246 in RB_LIBYARN_start (isforked=<value optimized out>) at 
resourcebroker_LIBYARN.c:153
#7  0x0000000000903bda in MainHandlerLoop () at resourcemanager.c:531
#8  0x00000000009041f1 in ResManagerMainServer2ndPhase () at 
resourcemanager.c:508
#9  0x0000000000904624 in ResManagerMain (argc=<value optimized out>, 
argv=<value optimized out>) at resourcemanager.c:330
#10 0x00000000009049b1 in ResManagerProcessStartup () at resourcemanager.c:402
#11 0x0000000000764b08 in CommenceNormalOperations () at postmaster.c:3616
#12 0x00000000007659c2 in do_reaper () at postmaster.c:3964
#13 0x000000000076a01d in ServerLoop () at postmaster.c:2102
#14 0x000000000076bb5e in PostmasterMain (argc=9, argv=0x32a15b0) at 
postmaster.c:1421
#15 0x00000000006c691a in main (argc=9, argv=0x32a1570) at main.c:226

There are two heartbeat thread at this moment, which means one heartbeat thread 
hasn't be canceled when RM reconnects libyarn.

In function ResBrokerMainInternal(), from line:270, should cancel the heartbeat 
thread before call RB2YARN_disconnectFromYARN 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to