zhaorenhai created IMPALA-10088:
-----------------------------------

             Summary: DeadLock while run unifiedbetests on aarch64 platform
                 Key: IMPALA-10088
                 URL: https://issues.apache.org/jira/browse/IMPALA-10088
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: zhaorenhai


When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, 
will happen deadlock.

The stacktrace is as following:

 
{code:java}
(gdb) bt
#0  0x0000ffff83099544 in __GI___nanosleep (requested_time=0xffffffc71698, 
remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00000000054cf144 in base::internal::SpinLockDelay (w=0x77385b0 
<tcmalloc::Static::pageheap_lock_>, value=2, loop=727956) at 
/home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
#2  0x0000000005529800 in SpinLock::SlowLock() ()
#3  0x00000000055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
#4  0x0000000005743374 in tc_calloc ()
#5  0x0000ffff81c737f4 in _dlerror_run (operate=operate@entry=0xffff81c73158 
<dlsym_doit>, args=0xffffffc717d8, args@entry=0xffffffc717f8) at dlerror.c:140
#6  0x0000ffff81c731f0 in __dlsym (handle=<optimized out>, name=<optimized 
out>) at dlsym.c:70
#7  0x000000000310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 
"dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
#8  0x000000000310ef1c in (anonymous namespace)::InitIfNecessary () at 
/home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
#9  0x000000000310f0b4 in dl_iterate_phdr (callback=0xffff81620d18 
<_Unwind_IteratePhdrCallback>, data=0xffffffc71900) at 
/home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
#10 0x0000ffff816215b4 in _Unwind_Find_FDE (pc=0xffff8161f98f 
<_Unwind_Backtrace+79>, bases=bases@entry=0xffffffc72438) at 
../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
#11 0x0000ffff8161dfdc in uw_frame_state_for 
(context=context@entry=0xffffffc72110, fs=fs@entry=0xffffffc719f0) at 
../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
#12 0x0000ffff8161ef3c in uw_init_context_1 
(context=context@entry=0xffffffc72110, outer_cfa=0xffffffc72b50, 
outer_cfa@entry=0xffffffc72be0, outer_ra=0x55298d8 
<GetStackTrace_libgcc(void**, int, int)+40>)
    at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
#13 0x0000ffff8161f990 in _Unwind_Backtrace (trace=0x5529a48 
<libgcc_backtrace_helper(_Unwind_Context*, void*)>, 
trace_argument=0xffffffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
#14 0x00000000055298d8 in GetStackTrace_libgcc(void**, int, int) ()
#15 0x0000000005529db4 in GetStackTrace(void**, int, int) ()
#16 0x00000000055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
{code}
I think this is same issue with 
[https://github.com/gperftools/gperftools/issues/1184] ,

because the issue will happen  when I tried with building gperftools both with 
libunwind and without libunwind .

 

And KUDU also have same issue:

https://issues.apache.org/jira/browse/KUDU-3072

I think the  solution in following link is not correct

[https://gerrit.cloudera.org/#/c/15420/]

in aarch64 , the method of getting stacktrace is not same with arm.

I think the correct solution of getting stacktrace is should like this:

[https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]

 

But I think the gperftools maybe not the root cause of this issue, because both 
gperftools and libunwind now can support aarch64 perfectly.

Maybe this commit of kudu has some bug?

[https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]

Because in x86, the gperftools will not use libunwind or libgcc to 
getstacktrace, so the issue will not happen.

I tried :
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
#define HOOK_DL_ITERATE_PHDR 1
#endif
{code}
change to 
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
#define HOOK_DL_ITERATE_PHDR 1
#endif{code}
The deadlock issue will not happen.

 

[[email protected]] [~tlipcon] [~adar]

What do you think about this issue? how to fix it? any suggestion?

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to