[ 
https://issues.apache.org/jira/browse/IMPALA-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701254#comment-16701254
 ] 

Tim Armstrong commented on IMPALA-7834:
---------------------------------------

The only commonality I'm seeing is that the crashes are all in codegen'd 
functions. I don't suppose you've been able to correlate the crashes with 
particular queries?

It's possible to figure out which query crashed from the coredump with a bit of 
GDB cleverness if you get the debuginfo package corresponding to your parcel 
and follow the RuntimeState pointer to get a query id. That's involved but I 
could probably provide pointers on how to do it.

Something slightly easier that might provide a clue is any of the 
hs_err_pid*.log files that the embedded JVM generates - they have some 
additional information that's somethings useful.

Another thing that's worth checking is if you have transparent huge pages set 
to "enabled". We've seen bugs on some kernels that cause crashes in codegen'd 
code. Both of the below should be either "madvise" or "never".
{code}
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag
{code}

> Impala different types of crashes
> ---------------------------------
>
>                 Key: IMPALA-7834
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7834
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Manikandan R
>            Priority: Critical
>              Labels: CDH5, crash
>         Attachments: stacktrace_106154_25Nov2018.txt, 
> stacktrace_10917_19Nov2018.txt, stacktrace_117834_06Nov2018.txt, 
> stacktrace_117834_06Nov2018.txt, stacktrace_119889_18Nov2018.txt, 
> stacktrace_12121_22Nov2018.txt, stacktrace_122223_14Nov2018.txt, 
> stacktrace_125057_05Nov.txt, stacktrace_14175_23Nov2018.txt, 
> stacktrace_15618_20Nov2018.txt, stacktrace_17249_20Nov2018.txt, 
> stacktrace_24446_27Nov2018.txt, stacktrace_28839_23Nov2018.txt, 
> stacktrace_29716_21Nov2018.txt, stacktrace_59011_19Nov2018.txt, 
> stacktrace_65470_23Nov2018.txt, stacktrace_72492_01Nov2018.txt, 
> stacktrace_72492_01Nov2018.txt, stacktrace_74831_28Oct.txt, 
> stacktrace_8486_14Nov2018.txt, stacktrace_84892_26Nov2018.txt
>
>
> Off late, We had witnessed different types of crashes in cluster. I don't see 
> any similarities among crashes stack traces and also not able to reproduce. 
> Below are the stack traces occurred in different daemons at different 
> timings. I do have complete stack traces and let me know if it helps for 
> further debugging.
> 1)
> 10-1-33-172
> Nov 6 18:04
> Thread 1 (Thread 0x7f018b423700 (LWP 96325)):
> #0 0x00007f0aaaf5e207 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00007f0aaaf5fa38 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00007f0aad280185 in os::abort(bool) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #3 0x00007f0aad422593 in VMError::report_and_die() () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #4 0x00007f0aad28568f in JVM_handle_linux_signal () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #5 0x00007f0aad27bbe3 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #6 <signal handler called>
> No symbol table info available.
> #7 0x00007f0a44ca3000 in ?? ()
> No symbol table info available.
> #8 0x0000000000dba5c1 in 
> impala::HdfsParquetScanner::TransferScratchTuples(impala::RowBatch*) ()
> No symbol table info available.
> #9 0x0000000000dba924 in 
> impala::HdfsParquetScanner::AssembleRows(std::vector<impala::ParquetColumnReader*,
>  std::allocator<impala::ParquetColumnReader*> > const&, impala::RowBatch*, 
> bool*) ()
> No symbol table info available.
> #10 0x0000000000dbf5f6 in 
> impala::HdfsParquetScanner::GetNextInternal(impala::RowBatch*) ()
> No symbol table info available.
> #11 0x0000000000db9ba7 in impala::HdfsParquetScanner::ProcessSplit() ()
> No symbol table info available.
> #12 0x0000000000d835e6 in 
> impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, 
> std::allocator<impala::FilterContext> > const&, impala::MemPool*, 
> impala::io::ScanRange*) ()
> No symbol table info available.
> #13 0x0000000000d85115 in impala::HdfsScanNode::ScannerThread() ()
> No symbol table info available.
> #14 0x0000000000d16c83 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> No symbol table info available.
> #15 0x0000000000d173c4 in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> No symbol table info available.
> #16 0x000000000128fada in thread_proxy ()
> No symbol table info available.
> #17 0x00007f0aab2fcdd5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #18 0x00007f0aab026b3d in clone () from /lib64/libc.so.6
> No symbol table info available.
> Nov 1 11:21
> Thread 1 (Thread 0x7fb3bffe9700 (LWP 50334)):
> #0 0x00007fc2959f0207 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00007fc2959f18f8 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00007fc297d12185 in os::abort(bool) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #3 0x00007fc297eb4593 in VMError::report_and_die() () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #4 0x00007fc297d1768f in JVM_handle_linux_signal () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #5 0x00007fc297d0dbe3 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #6 <signal handler called>
> No symbol table info available.
> #7 0x00007fc1dd9f0000 in ?? ()
> No symbol table info available.
> #8 0x0000000000fdd9f9 in 
> impala::PartitionedAggregationNode::Open(impala::RuntimeState*) ()
> No symbol table info available.
> #9 0x0000000000b74d6d in impala::FragmentInstanceState::Open() ()
> No symbol table info available.
> #10 0x0000000000b763ab in impala::FragmentInstanceState::Exec() ()
> No symbol table info available.
> #11 0x0000000000b65b38 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> No symbol table info available.
> #12 0x0000000000d16c83 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> No symbol table info available.
> #13 0x0000000000d173c4 in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> No symbol table info available.
> #14 0x000000000128fada in thread_proxy ()
> No symbol table info available.
> #15 0x00007fc295d8edd5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #16 0x00007fc295ab8b3d in clone () from /lib64/libc.so.6
> No symbol table info available.
> 2)
> 10-1-42-100
> Oct 28 07:15
> (I am seeing this particular issue being discussed in 
> https://issues.apache.org/jira/browse/IMPALA-7194)
> Thread 1 (Thread 0x7f6ef14f6700 (LWP 15999)):
> #0 0x00007f7c95316207 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00007f7c953178f8 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00007f7c97638185 in os::abort(bool) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #3 0x00007f7c977da593 in VMError::report_and_die() () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #4 0x00007f7c9763d68f in JVM_handle_linux_signal () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #5 0x00007f7c97633be3 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #6 <signal handler called>
> No symbol table info available.
> #7 0x00007f7bdcdaa830 in ?? ()
> No symbol table info available.
> #8 0x0000000000fddd0f in 
> impala::PartitionedAggregationNode::Open(impala::RuntimeState*) ()
> No symbol table info available.
> #9 0x0000000000b74d6d in impala::FragmentInstanceState::Open() ()
> No symbol table info available.
> #10 0x0000000000b763ab in impala::FragmentInstanceState::Exec() ()
> No symbol table info available.
> #11 0x0000000000b65b38 in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> No symbol table info available.
> #12 0x0000000000d16c83 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> No symbol table info available.
> #13 0x0000000000d173c4 in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> No symbol table info available.
> #14 0x000000000128fada in thread_proxy ()
> No symbol table info available.
> #15 0x00007f7c956b4dd5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #16 0x00007f7c953deb3d in clone () from /lib64/libc.so.6
> No symbol table info available.
> 3)
> 10-1-43-65
> Nov 5 19:25
> Thread 1 (Thread 0x7f484dfc3700 (LWP 5561)):
> #0 0x00007f4c61734207 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00007f4c617358f8 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00007f4c63a56185 in os::abort(bool) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #3 0x00007f4c63bf8593 in VMError::report_and_die() () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #4 0x00007f4c63a5b68f in JVM_handle_linux_signal () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #5 0x00007f4c63a51be3 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> No symbol table info available.
> #6 <signal handler called>
> No symbol table info available.
> #7 0x00007f4ba9146250 in ?? ()
> No symbol table info available.
> #8 0x0000000001023b52 in impala::PhjBuilder::Partition::BuildHashTable(bool*) 
> ()
> No symbol table info available.
> #9 0x0000000001023dde in 
> impala::PhjBuilder::BuildHashTablesAndPrepareProbeStreams() ()
> No symbol table info available.
> #10 0x0000000001024340 in 
> impala::PhjBuilder::FlushFinal(impala::RuntimeState*) ()
> No symbol table info available.
> #11 0x000000000100da6d in impala::Status 
> impala::BlockingJoinNode::SendBuildInputToSink<true>(impala::RuntimeState*, 
> impala::DataSink*) ()
> No symbol table info available.
> #12 0x000000000100c2a0 in 
> impala::BlockingJoinNode::ProcessBuildInputAsync(impala::RuntimeState*, 
> impala::DataSink*, impala::Status*) ()
> No symbol table info available.
> #13 0x0000000000d16c83 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function<void ()>, impala::Promise<long>*) ()
> No symbol table info available.
> #14 0x0000000000d173c4 in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> No symbol table info available.
> #15 0x000000000128fada in thread_proxy ()
> No symbol table info available.
> #16 0x00007f4c61ad2dd5 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #17 0x00007f4c617fcb3d in clone () from /lib64/libc.so.6
> No symbol table info available



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to