[
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wong resolved KUDU-3271.
-------------------------------
Fix Version/s: 1.13.0
Resolution: Fixed
I checked out the commit before 163cd25 and copied over the test in the patch.
After running it a couple times, I ran into:
{code:java}
I0408 22:49:44.993857 54213 ts_tablet_manager.cc:1144] T
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1: Time spent
starting tablet: real 0.004s user 0.002s sys 0.002s
I0408 22:49:44.993940 54215 raft_consensus.cc:683] T
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [term 1
LEADER]: Becoming Leader. State: Replica: dbfd161726d64fa0b01e8a9237fb37d1,
State: Running, Role: LEADER
W0408 22:49:44.993994 54151 reactor.cc:681] Failed to create an outbound
connection to 255.255.255.255:1 because connect() failed: Network error:
connect(2) error: Network is unreachable (error 101)
I0408 22:49:44.994019 54215 consensus_queue.cc:227] T
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [LEADER]:
Queue going to LEADER mode. State: All replicated index: 0, Majority replicated
index: 0, Committed index: 0, Last appended: 0.0, Last appended by leader: 0,
Current term: 1, Majority size: 1, State: 0, Mode: LEADER, active raft config:
opid_index: -1 peers { permanent_uuid: "dbfd161726d64fa0b01e8a9237fb37d1"
member_type: VOTER last_known_addr { host: "127.0.0.1" port: 44157 } }
*** Aborted at 1617947385 (unix time) try "date -d @1617947385" if you are
using GNU date ***
I0408 22:49:45.024998 54168 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025013 54167 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025015 54166 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025023 54163 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025087 54167 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025157 54167 tablet_service.cc:2747] Scan: Not found: Scanner
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence
id=100, remote={username='awong'} at 127.0.0.1:60548
PC: @ 0x229eed3 kudu::UnionIterator::HasNext()
*** SIGSEGV (@0x0) received by PID 54140 (TID 0x7fa30cfde700) from PID 0; stack
trace: ***
@ 0x7fa31d2b9370 (unknown)
@ 0x229eed3 kudu::UnionIterator::HasNext()
@ 0xb3300c
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
@ 0xb45a09 kudu::tserver::TabletServiceImpl::Scan()
@ 0x2227b79 kudu::rpc::GeneratedServiceIf::Handle()
@ 0x2228839 kudu::rpc::ServicePool::RunThread()
@ 0x23af01f kudu::Thread::SuperviseThread()
@ 0x7fa31d2b1dc5 start_thread
@ 0x7fa31b60976d __clone
Segmentation fault {code}
So I think it's safe to say this was indeed addressed by Todd's locking commit.
[~zhangyifan27] If you're able, feel free to pull 163cd25 into your version of
Kudu to prevent this in the future, or consider upgrading to 1.13 or higher.
> Tablet server crashed when handle scan request
> ----------------------------------------------
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.12.0
> Reporter: YifanZhang
> Priority: Major
> Fix For: 1.13.0
>
> Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The
> scanned table didn't have any row operations at that time. This issue only
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0 kudu::tablet::DeltaApplier::HasNext (this=<optimized out>) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1 0x0000000002185900 in kudu::UnionIterator::HasNext (this=<optimized out>)
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2 0x0000000000a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner
> (this=0x4fea140, scanner_id=...) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3 0x00000000009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610,
> __in_chrg=<optimized out>) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880,
> rpc_context=rpc_context@entry=0x8151d7800,
> result_collector=result_collector@entry=0x7f2d721679f0,
> has_more_results=has_more_results@entry=0x7f2d721678f9,
> error_code=error_code@entry=0x7f2d721678fc) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5 0x00000000009fb009 in kudu::tserver::TabletServiceImpl::Scan
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6 0x000000000210f019 in operator() (__args#2=0x8151d7800,
> __args#1=0xb87b16de0, __args#0=<optimized out>, this=0x4e0c7708) at
> /usr/include/c++/4.8.2/functional:2471
> #7 kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call=<optimized
> out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8 0x000000000210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0)
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9 0x000000000228ecaf in operator() (this=0xc1a58c28) at
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11
> 0x00007f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12
> 0x00007f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)