[
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
YifanZhang updated KUDU-3108:
-----------------------------
Description:
When we did an incremental backup for tables in a cluster with 20 tservers, 3
tservers crashed, coredump stacks are the same:
{code}
Unable to find source-code formatter for language: shell. Available languages
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go,
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl,
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml,
yamlProgram terminated with signal 11, Segmentation fault.Program terminated
with signal 11, Segmentation fault.
#0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
(this=0x25b883680, lhs=..., rhs=...) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file or
directory.
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64
cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64
cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64
elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64
libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64
libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64
libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64
ncurses-libs-5.9-13.20130511.el7.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64
systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
(this=0x25b883680, lhs=..., rhs=...) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
#1 0x0000000001da51fb in kudu::MergeIterator::RefillHotHeap
(this=this@entry=0x78f6ec500) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
#2 0x0000000001da622b in kudu::MergeIterator::AdvanceAndReheap
(this=this@entry=0x78f6ec500, state=0xd1661a000,
num_rows_to_advance=num_rows_to_advance@entry=1) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
#3 0x0000000001da7927 in kudu::MergeIterator::MaterializeOneRow
(this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0,
dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
#4 0x0000000001da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500,
dst=0x7f0d5cc9ffc0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
#5 0x0000000000a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock
(this=<optimized out>, dst=<optimized out>) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
#6 0x000000000095475c in
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest
(this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,
rpc_context=rpc_context@entry=0x5e512a460,
result_collector=result_collector@entry=0x7f0d5cca0a00,
has_more_results=has_more_results@entry=0x7f0d5cca0886,
error_code=error_code@entry=0x7f0d5cca0888) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
#7 0x0000000000966564 in
kudu::tserver::TabletServiceImpl::HandleNewScanRequest
(this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,
rpc_context=rpc_context@entry=0x5e512a460,
result_collector=result_collector@entry=0x7f0d5cca0a00,
scanner_id=scanner_id@entry=0x7f0d5cca0940,
snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950,
has_more_results=has_more_results@entry=0x7f0d5cca0886,
error_code=error_code@entry=0x7f0d5cca0888) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
#8 0x0000000000967f4b in kudu::tserver::TabletServiceImpl::Scan
(this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674
#9 0x0000000001d2e449 in operator() (__args#2=0x5e512a460,
__args#1=0x56f9be6c0, __args#0=<optimized out>, this=0x497ecdd8) at
/usr/include/c++/4.8.2/functional:2471
#10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call=<optimized
out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
#11 0x0000000001d2eb49 in kudu::rpc::ServicePool::RunThread (this=0x2ab69560)
at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
#12 0x0000000001e9e924 in operator() (this=0x90fb52e8) at
/home/zhangyifan8/work/kudu-xm/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:771
#13 kudu::Thread::SuperviseThread (arg=0x90fb52c0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:657
#14 0x00007f103b20cdc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f103956673d in clone () from /lib64/libc.so.6
{code}
was:
When we use KuduBackup Spark job to backup tables in a cluster with 20
tservers, 3 tservers crashed, coredump stacks are the same:
{code:java}
Program terminated with signal 11, Segmentation fault.Program terminated with
signal 11, Segmentation fault.
#0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
(this=0x25b883680, lhs=..., rhs=...) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file or
directory.
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64
cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64
cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64
elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64
libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64
libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64
libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64
ncurses-libs-5.9-13.20130511.el7.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64
systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
(this=0x25b883680, lhs=..., rhs=...) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
#1 0x0000000001da51fb in kudu::MergeIterator::RefillHotHeap
(this=this@entry=0x78f6ec500) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
#2 0x0000000001da622b in kudu::MergeIterator::AdvanceAndReheap
(this=this@entry=0x78f6ec500, state=0xd1661a000,
num_rows_to_advance=num_rows_to_advance@entry=1) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
#3 0x0000000001da7927 in kudu::MergeIterator::MaterializeOneRow
(this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0,
dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
#4 0x0000000001da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500,
dst=0x7f0d5cc9ffc0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
#5 0x0000000000a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock
(this=<optimized out>, dst=<optimized out>) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
#6 0x000000000095475c in
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest
(this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,
rpc_context=rpc_context@entry=0x5e512a460,
result_collector=result_collector@entry=0x7f0d5cca0a00,
has_more_results=has_more_results@entry=0x7f0d5cca0886,
error_code=error_code@entry=0x7f0d5cca0888) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
#7 0x0000000000966564 in
kudu::tserver::TabletServiceImpl::HandleNewScanRequest
(this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,
rpc_context=rpc_context@entry=0x5e512a460,
result_collector=result_collector@entry=0x7f0d5cca0a00,
scanner_id=scanner_id@entry=0x7f0d5cca0940,
snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950,
has_more_results=has_more_results@entry=0x7f0d5cca0886,
error_code=error_code@entry=0x7f0d5cca0888) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
#8 0x0000000000967f4b in kudu::tserver::TabletServiceImpl::Scan
(this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at
/home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674
#9 0x0000000001d2e449 in operator() (__args#2=0x5e512a460,
__args#1=0x56f9be6c0, __args#0=<optimized out>, this=0x497ecdd8) at
/usr/include/c++/4.8.2/functional:2471
#10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call=<optimized
out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
#11 0x0000000001d2eb49 in kudu::rpc::ServicePool::RunThread (this=0x2ab69560)
at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
#12 0x0000000001e9e924 in operator() (this=0x90fb52e8) at
/home/zhangyifan8/work/kudu-xm/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:771
#13 kudu::Thread::SuperviseThread (arg=0x90fb52c0) at
/home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:657
#14 0x00007f103b20cdc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f103956673d in clone () from /lib64/libc.so.6
{code}
> Tablet server crashes when handle diffscan request
> ---------------------------------------------------
>
> Key: KUDU-3108
> URL: https://issues.apache.org/jira/browse/KUDU-3108
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.10.1
> Reporter: YifanZhang
> Priority: Major
>
> When we did an incremental backup for tables in a cluster with 20 tservers,
> 3 tservers crashed, coredump stacks are the same:
> {code}
> Unable to find source-code formatter for language: shell. Available languages
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go,
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc,
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml,
> yamlProgram terminated with signal 11, Segmentation fault.Program terminated
> with signal 11, Segmentation fault.
> #0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
> (this=0x25b883680, lhs=..., rhs=...) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file
> or directory.
> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64
> cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64
> cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64
> elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64
> libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64
> libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64
> libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64
> ncurses-libs-5.9-13.20130511.el7.x86_64
> nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
> openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64
> systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow>
> (this=0x25b883680, lhs=..., rhs=...) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> #1 0x0000000001da51fb in kudu::MergeIterator::RefillHotHeap
> (this=this@entry=0x78f6ec500) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
> #2 0x0000000001da622b in kudu::MergeIterator::AdvanceAndReheap
> (this=this@entry=0x78f6ec500, state=0xd1661a000,
> num_rows_to_advance=num_rows_to_advance@entry=1) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
> #3 0x0000000001da7927 in kudu::MergeIterator::MaterializeOneRow
> (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0,
> dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
> #4 0x0000000001da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500,
> dst=0x7f0d5cc9ffc0) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
> #5 0x0000000000a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock
> (this=<optimized out>, dst=<optimized out>) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
> #6 0x000000000095475c in
> kudu::tserver::TabletServiceImpl::HandleContinueScanRequest
> (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,
> rpc_context=rpc_context@entry=0x5e512a460,
> result_collector=result_collector@entry=0x7f0d5cca0a00,
> has_more_results=has_more_results@entry=0x7f0d5cca0886,
> error_code=error_code@entry=0x7f0d5cca0888) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
> #7 0x0000000000966564 in
> kudu::tserver::TabletServiceImpl::HandleNewScanRequest
> (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,
> rpc_context=rpc_context@entry=0x5e512a460,
> result_collector=result_collector@entry=0x7f0d5cca0a00,
> scanner_id=scanner_id@entry=0x7f0d5cca0940,
> snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950,
> has_more_results=has_more_results@entry=0x7f0d5cca0886,
> error_code=error_code@entry=0x7f0d5cca0888) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
> #8 0x0000000000967f4b in kudu::tserver::TabletServiceImpl::Scan
> (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674
> #9 0x0000000001d2e449 in operator() (__args#2=0x5e512a460,
> __args#1=0x56f9be6c0, __args#0=<optimized out>, this=0x497ecdd8) at
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call=<optimized
> out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #11 0x0000000001d2eb49 in kudu::rpc::ServicePool::RunThread (this=0x2ab69560)
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #12 0x0000000001e9e924 in operator() (this=0x90fb52e8) at
> /home/zhangyifan8/work/kudu-xm/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:771
> #13 kudu::Thread::SuperviseThread (arg=0x90fb52c0) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:657
> #14 0x00007f103b20cdc5 in start_thread () from /lib64/libpthread.so.0
> #15 0x00007f103956673d in clone () from /lib64/libc.so.6
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)