Andrew Wong created KUDU-2635:
---------------------------------

             Summary: Tserver crash because some orphaned blocks are still 
listed when deleting metadata
                 Key: KUDU-2635
                 URL: https://issues.apache.org/jira/browse/KUDU-2635
             Project: Kudu
          Issue Type: Bug
          Components: fs, tablet, tserver
    Affects Versions: 1.7.0
            Reporter: Andrew Wong
            Assignee: Andrew Wong


In some cases, upon deleting a tablet, a tablet server may fail to delete some 
blocks, and then fail to delete the tablet metadata, leading to a crash since 
failure to delete metadata is a fatal error. That's what happened in the below 
logs, but it's unclear why the blocks failed to be deleted, and why the server 
stayed up for a couple minutes after before receiving a delete tablet request, 
and ultimately crashing. Following the crash, the server was able to start up 
successfully.

 

{{I1130 00:00:07.565915 29721 tablet_service.cc:795] Processing DeleteTablet 
for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type 
TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from 
\{username='kudu'} at 10.93.87.15:47194}}
{{I1130 00:00:07.565929 29721 tablet_replica.cc:262] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: stopping 
tablet replica}}
{{I1130 00:00:07.565954 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
CompactRowSetsOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.565997 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
MinorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566010 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
MajorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566020 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
UndoDeltaBlockGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566032 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
FlushMRSOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566040 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
FlushDeltaMemStoresOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566048 29721 maintenance_manager.cc:235] P 
97235196a93b41c29954ed8534aa2ddc: Unregistered op 
LogGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
{{I1130 00:00:07.566056 29721 raft_consensus.cc:2012] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3 
FOLLOWER]: Raft consensus shutting down.}}
{{I1130 00:00:07.566074 29721 raft_consensus.cc:2039] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3 
FOLLOWER]: Raft consensus is shut down!}}
{{I1130 00:00:07.666061 29721 ts_tablet_manager.cc:1277] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting 
tablet data with delete state TABLET_DATA_DELETED}}
{{I1130 00:00:08.102607 29721 ts_tablet_manager.cc:1290] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: tablet 
deleted with delete type TABLET_DATA_DELETED: last-logged OpId 3.1166195}}
{{I1130 00:00:08.102629 29721 log.cc:981] T 1db7aa7e81474907ace3d493c24cdc94 P 
97235196a93b41c29954ed8534aa2ddc: Deleting WAL directory at 
/home/kudu/tablet/wal/wals/1db7aa7e81474907ace3d493c24cdc94}}
{{I1130 00:00:08.103217 29721 ts_tablet_manager.cc:1310] T 
1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting 
consensus metadata}}
{{F1130 00:00:08.155643 29721 ts_tablet_manager.cc:848] Failed to delete tablet 
data for 1db7aa7e81474907ace3d493c24cdc94: Invalid argument: Unable to delete 
on-disk data from tablet 1db7aa7e81474907ace3d493c24cdc94: The metadata for 
tablet 1db7aa7e81474907ace3d493c24cdc94 still references orphaned blocks. Call 
DeleteTabletData() first}}
{{I1130 00:02:09.460352 29725 tablet_service.cc:795] Processing DeleteTablet 
for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type 
TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from 
\{username='kudu'} at 10.93.87.15:47194}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to