[
https://issues.apache.org/jira/browse/KUDU-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869722#comment-16869722
]
xiaokai.wang edited comment on KUDU-2635 at 6/22/19 1:51 AM:
-------------------------------------------------------------
[~andrew.wong]
latest INFO:
{noformat}
I2019-06-22 00:31:35.494654 8319 ts_tablet_manager.cc:1406] T
75724232dc2b4c55a7b3f3dda71de421 P ee9ebbb99eef401ca89d6ca70beeca7b: Deleting
tablet data with delete state TABLET_DATA_DELETED
I2019-06-22 00:31:35.500349 8319 log_block_manager.cc:2342] Block
0000004771013963 is in a failed directory; not deleting
I2019-06-22 00:31:35.500373 8319 log_block_manager.cc:2342] Block
0000004770674741 is in a failed directory; not deleting
I2019-06-22 00:31:35.500394 8319 log_block_manager.cc:2342] Block
0000004770306047 is in a failed directory; not deleting
I2019-06-22 00:31:35.500412 8319 log_block_manager.cc:2342] Block
0000004770305778 is in a failed directory; not deleting
I2019-06-22 00:31:35.500428 8319 log_block_manager.cc:2342] Block
0000004770150575 is in a failed directory; not deleting
I2019-06-22 00:31:35.500440 8319 log_block_manager.cc:2342] Block
0000004767842514 is in a failed directory; not deleting
I2019-06-22 00:31:35.500457 8319 log_block_manager.cc:2342] Block
0000004767693603 is in a failed directory; not deleting
I2019-06-22 00:31:35.500475 8319 log_block_manager.cc:2342] Block
0000004768979091 is in a failed directory; not deleting
I2019-06-22 00:31:35.500494 8319 log_block_manager.cc:2342] Block
0000004769517719 is in a failed directory; not deleting
I2019-06-22 00:31:35.500526 8319 log_block_manager.cc:2342] Block
0000004770532796 is in a failed directory; not deleting
I2019-06-22 00:31:35.500545 8319 log_block_manager.cc:2342] Block
0000004767409367 is in a failed directory; not deleting
I2019-06-22 00:31:35.500558 8319 log_block_manager.cc:2342] Block
0000004767843036 is in a failed directory; not deleting
I2019-06-22 00:31:35.500574 8319 log_block_manager.cc:2342] Block
0000004767841625 is in a failed directory; not deleting
I2019-06-22 00:31:35.500588 8319 log_block_manager.cc:2342] Block
0000004766984801 is in a failed directory; not deleting
I2019-06-22 00:31:35.500605 8319 log_block_manager.cc:2342] Block
0000004766601709 is in a failed directory; not deleting
I2019-06-22 00:31:35.500622 8319 log_block_manager.cc:2342] Block
0000004770150226 is in a failed directory; not deleting
I2019-06-22 00:31:35.500638 8319 log_block_manager.cc:2342] Block
0000004768610830 is in a failed directory; not deleting
I2019-06-22 00:31:35.500653 8319 log_block_manager.cc:2342] Block
0000004769437711 is in a failed directory; not deleting
I2019-06-22 00:31:35.500663 8319 log_block_manager.cc:2342] Block
0000004767032135 is in a failed directory; not deleting
I2019-06-22 00:31:35.500674 8319 log_block_manager.cc:2342] Block
0000004769516808 is in a failed directory; not deleting
I2019-06-22 00:31:35.500684 8319 log_block_manager.cc:2342] Block
0000004767842087 is in a failed directory; not deleting
I2019-06-22 00:31:35.500694 8319 log_block_manager.cc:2342] Block
0000004768086357 is in a failed directory; not deleting
I2019-06-22 00:31:35.500705 8319 log_block_manager.cc:2342] Block
0000004768494227 is in a failed directory; not deleting
I2019-06-22 00:31:35.500715 8319 log_block_manager.cc:2342] Block
0000004768494236 is in a failed directory; not deleting
I2019-06-22 00:31:35.500730 8319 log_block_manager.cc:2342] Block
0000004769517402 is in a failed directory; not deleting
I2019-06-22 00:31:35.500741 8319 log_block_manager.cc:2342] Block
0000004769517409 is in a failed directory; not deleting
I2019-06-22 00:31:35.500751 8319 log_block_manager.cc:2342] Block
0000004768977680 is in a failed directory; not deleting
I2019-06-22 00:31:35.500766 8319 log_block_manager.cc:2342] Block
0000004768977902 is in a failed directory; not deleting
I2019-06-22 00:31:35.500774 8319 log_block_manager.cc:2342] Block
0000004769417178 is in a failed directory; not deleting
I2019-06-22 00:31:35.500792 8319 log_block_manager.cc:2342] Block
0000004767842978 is in a failed directory; not deleting
I2019-06-22 00:31:35.500803 8319 log_block_manager.cc:2342] Block
0000004769388741 is in a failed directory; not deleting
I2019-06-22 00:31:35.500825 8319 log_block_manager.cc:2342] Block
0000004769388928 is in a failed directory; not deleting
I2019-06-22 00:31:35.500839 8319 log_block_manager.cc:2342] Block
0000004769416193 is in a failed directory; not deleting
I2019-06-22 00:31:35.500854 8319 log_block_manager.cc:2342] Block
0000004769518153 is in a failed directory; not deleting
I2019-06-22 00:31:35.500871 8319 log_block_manager.cc:2342] Block
0000004771013746 is in a failed directory; not deleting
W2019-06-22 00:31:35.537019 8319 tablet_metadata.cc:512] not all orphaned
blocks were deleted: IO error: only deleted 3348 blocks, first failure: Block
is in a failed directory
I2019-06-22 00:31:35.543174 8319 log_block_manager.cc:2342] Block
0000004771013024 is in a failed directory; not deleting
I2019-06-22 00:31:35.543211 8319 log_block_manager.cc:2342] Block
0000004770533228 is in a failed directory; not deleting
I2019-06-22 00:31:35.543236 8319 log_block_manager.cc:2342] Block
0000004770306049 is in a failed directory; not deleting
I2019-06-22 00:31:35.543251 8319 log_block_manager.cc:2342] Block
0000004770305771 is in a failed directory; not deleting
I2019-06-22 00:31:35.543274 8319 log_block_manager.cc:2342] Block
0000004770150246 is in a failed directory; not deleting
I2019-06-22 00:31:35.543295 8319 log_block_manager.cc:2342] Block
0000004767842511 is in a failed directory; not deleting
I2019-06-22 00:31:35.543313 8319 log_block_manager.cc:2342] Block
0000004767693596 is in a failed directory; not deleting
I2019-06-22 00:31:35.543339 8319 log_block_manager.cc:2342] Block
0000004770675360 is in a failed directory; not deleting
I2019-06-22 00:31:35.543357 8319 log_block_manager.cc:2342] Block
0000004770532996 is in a failed directory; not deleting
I2019-06-22 00:31:35.543376 8319 log_block_manager.cc:2342] Block
0000004764355762 is in a failed directory; not deleting
I2019-06-22 00:31:35.543395 8319 log_block_manager.cc:2342] Block
0000004767841942 is in a failed directory; not deleting
I2019-06-22 00:31:35.543409 8319 log_block_manager.cc:2342] Block
0000004770532662 is in a failed directory; not deleting
I2019-06-22 00:31:35.543426 8319 log_block_manager.cc:2342] Block
0000004767841654 is in a failed directory; not deleting
I2019-06-22 00:31:35.543442 8319 log_block_manager.cc:2342] Block
0000004766984489 is in a failed directory; not deleting
I2019-06-22 00:31:35.543459 8319 log_block_manager.cc:2342] Block
0000004766601694 is in a failed directory; not deleting
I2019-06-22 00:31:35.543473 8319 log_block_manager.cc:2342] Block
0000004770150225 is in a failed directory; not deleting
I2019-06-22 00:31:35.543485 8319 log_block_manager.cc:2342] Block
0000004769388894 is in a failed directory; not deleting
I2019-06-22 00:31:35.543501 8319 log_block_manager.cc:2342] Block
0000004766984064 is in a failed directory; not deleting
I2019-06-22 00:31:35.543531 8319 log_block_manager.cc:2342] Block
0000004767693593 is in a failed directory; not deleting
I2019-06-22 00:31:35.543548 8319 log_block_manager.cc:2342] Block
0000004767409351 is in a failed directory; not deleting
I2019-06-22 00:31:35.543566 8319 log_block_manager.cc:2342] Block
0000004770674562 is in a failed directory; not deleting
I2019-06-22 00:31:35.543583 8319 log_block_manager.cc:2342] Block
0000004768086367 is in a failed directory; not deleting
I2019-06-22 00:31:35.543597 8319 log_block_manager.cc:2342] Block
0000004767843093 is in a failed directory; not deleting
I2019-06-22 00:31:35.543612 8319 log_block_manager.cc:2342] Block
0000004766984093 is in a failed directory; not deleting
I2019-06-22 00:31:35.543637 8319 log_block_manager.cc:2342] Block
0000004767370691 is in a failed directory; not deleting
I2019-06-22 00:31:35.543653 8319 log_block_manager.cc:2342] Block
0000004767842243 is in a failed directory; not deleting
I2019-06-22 00:31:35.543668 8319 log_block_manager.cc:2342] Block
0000004770274868 is in a failed directory; not deleting
I2019-06-22 00:31:35.543685 8319 log_block_manager.cc:2342] Block
0000004770331047 is in a failed directory; not deleting
I2019-06-22 00:31:35.543702 8319 log_block_manager.cc:2342] Block
0000004769388915 is in a failed directory; not deleting
I2019-06-22 00:31:35.543720 8319 log_block_manager.cc:2342] Block
0000004769388802 is in a failed directory; not deleting
I2019-06-22 00:31:35.543752 8319 log_block_manager.cc:2342] Block
0000004769417029 is in a failed directory; not deleting
I2019-06-22 00:31:35.543768 8319 log_block_manager.cc:2342] Block
0000004769388931 is in a failed directory; not deleting
I2019-06-22 00:31:35.543783 8319 log_block_manager.cc:2342] Block
0000004769416205 is in a failed directory; not deleting
I2019-06-22 00:31:35.543800 8319 log_block_manager.cc:2342] Block
0000004771013739 is in a failed directory; not deleting
I2019-06-22 00:31:35.543815 8319 log_block_manager.cc:2342] Block
0000004769518175 is in a failed directory; not deleting
W2019-06-22 00:31:35.543828 8319 tablet_metadata.cc:512] not all orphaned
blocks were deleted: IO error: only deleted 0 blocks, first failure: Block is
in a failed directory
I2019-06-22 00:31:35.544059 8319 ts_tablet_manager.cc:1419] T
75724232dc2b4c55a7b3f3dda71de421 P ee9ebbb99eef401ca89d6ca70beeca7b: tablet
deleted with delete type TABLET_DATA_DELETED: last-logged OpId 3.145792
I2019-06-22 00:31:35.544088 8319 log.cc:1002] T
75724232dc2b4c55a7b3f3dda71de421 P ee9ebbb99eef401ca89d6ca70beeca7b: Deleting
WAL directory at /home/admin/logs/kudu-wal/wals/75724232dc2b4c55a7b3f3dda71de421
I2019-06-22 00:31:35.544353 8319 ts_tablet_manager.cc:1440] T
75724232dc2b4c55a7b3f3dda71de421 P ee9ebbb99eef401ca89d6ca70beeca7b: Deleting
consensus metadata
F2019-06-22 00:31:35.544447 8319 ts_tablet_manager.cc:957] Failed to delete
tablet data for 75724232dc2b4c55a7b3f3dda71de421: Invalid argument: Unable to
delete on-disk data from tablet 75724232dc2b4c55a7b3f3dda71de421: The metadata
for tablet 75724232dc2b4c55a7b3f3dda71de421 still references orphaned blocks.
Call DeleteTabletData() first
{noformat}
Sorry, the log is right now. It looks like disk error? [~andrew.wong]
was (Author: xiaokai):
[~andrew.wong]
latest INFO:
I2019-06-22 00:31:39.572243 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
FlushDeltaMemStoresOp(349276d8ceed418ab3ead23130e6a8d1) complete. Timing: real
0.005s user 0.001s sys 0.001s Metrics:
\{"bytes_written":2669,"delete_count":0,"fdatasync":3,"fdatasync_us":3212,"lbm_write_time_us":40,"lbm_writes_lt_1ms":4,"reinsert_count":0,"update_count":575}
I2019-06-22 00:31:39.687263 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
MajorDeltaCompactionOp(349276d8ceed418ab3ead23130e6a8d1) complete. Timing: real
0.046s user 0.004s sys 0.004s Metrics:
\{"cfile_cache_miss":53,"cfile_cache_miss_bytes":5975,"cfile_init":27,"data
dirs.queue_time_us":177,"data dirs.run_cpu_time_us":46,"data
dirs.run_wall_time_us":47,"delete_count":0,"delta_blocks_compacted":1,"delta_iterators_relevant":1,"fdatasync":53,"fdatasync_us":34747,"lbm_read_time_us":3166,"lbm_reads_lt_1ms":187,"lbm_write_time_us":792,"lbm_writes_lt_1ms":129,"reinsert_count":0,"spinlock_wait_cycles":243712,"thread_start_us":66,"threads_started":2,"update_count":575}
I2019-06-22 00:31:39.831418 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
FlushDeltaMemStoresOp(a61a60ea564c4e2685f1801aa47f0e87) complete. Timing: real
0.074s user 0.001s sys 0.000s Metrics:
\{"bytes_written":518,"delete_count":0,"fdatasync":3,"fdatasync_us":72447,"lbm_write_time_us":30,"lbm_writes_lt_1ms":4,"reinsert_count":0,"update_count":20}
I2019-06-22 00:31:39.899981 38060 tablet.cc:1885] T
3510c5e81df847859d960a46dadce33d P ee9ebbb99eef401ca89d6ca70beeca7b: Creating
new rowset, rs_create_time_sec: 1561134699 I2019-06-22 00:31:39.906716 38055
maintenance_manager.cc:540] P ee9ebbb99eef401ca89d6ca70beeca7b:
CompactRowSetsOp(65e507f021c14e309e7fc30331173944) complete. Timing: real
6.092s user 1.952s sys 0.653s Metrics:
\{"bytes_written":212748092,"cfile_cache_hit":249,"cfile_cache_hit_bytes":9236687,"cfile_cache_miss":249,"cfile_cache_miss_bytes":9236687,"cfile_init":123,"data
dirs.queue_time_us":302,"data dirs.run_cpu_time_us":340,"data
dirs.run_wall_time_us":23388,"delta_iterators_relevant":6,"drs_written":3,"fdatasync":107,"fdatasync_us":145208,"lbm_read_time_us":658912,"lbm_reads_1-10_ms":135,"lbm_reads_lt_1ms":2934,"lbm_write_time_us":267622,"lbm_writes_1-10_ms":1,"lbm_writes_lt_1ms":4021,"num_input_rowsets":3,"rows_written":252084,"thread_start_us":156,"threads_started":3,"wal-append.queue_time_us":67}
I2019-06-22 00:31:39.977393 38055 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
UndoDeltaBlockGCOp(65e507f021c14e309e7fc30331173944) complete. Timing: real
0.001s user 0.000s sys 0.000s Metrics:
\{"cfile_init":3,"lbm_read_time_us":42,"lbm_reads_lt_1ms":12} I2019-06-22
00:31:40.054874 38055 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
UndoDeltaBlockGCOp(d981f2981df749f6975329efe87a13a6) complete. Timing: real
0.006s user 0.001s sys 0.004s Metrics: \{"data dirs.queue_time_us":2808,"data
dirs.run_cpu_time_us":602,"data
dirs.run_wall_time_us":602,"fdatasync":1,"fdatasync_us":205,"spinlock_wait_cycles":2176,"thread_start_us":53,"threads_started":1,"wal-append.queue_time_us":82}
I2019-06-22 00:31:40.124300 38055 log_reader.cc:552] T
d8215bf9d0d14dc19543c4fcaa2bb60a: removed 1 log segments from log reader
I2019-06-22 00:31:40.124320 38055 log.cc:875] T
d8215bf9d0d14dc19543c4fcaa2bb60a P ee9ebbb99eef401ca89d6ca70beeca7b: Deleting
log segment in path:
/home/admin/logs/kudu-wal/wals/d8215bf9d0d14dc19543c4fcaa2bb60a/wal-000000258
(ops 11718943-11772405) I2019-06-22 00:31:40.124456 38055
maintenance_manager.cc:540] P ee9ebbb99eef401ca89d6ca70beeca7b:
LogGCOp(d8215bf9d0d14dc19543c4fcaa2bb60a) complete. Timing: real 0.000s user
0.000s sys 0.000s Metrics: {}
> Tserver crash because some orphaned blocks are still listed when deleting
> metadata
> ----------------------------------------------------------------------------------
>
> Key: KUDU-2635
> URL: https://issues.apache.org/jira/browse/KUDU-2635
> Project: Kudu
> Issue Type: Bug
> Components: fs, tablet, tserver
> Affects Versions: 1.7.0
> Reporter: Andrew Wong
> Assignee: Andrew Wong
> Priority: Major
>
> In some cases, upon deleting a tablet, a tablet server may fail to delete
> some blocks, and then fail to delete the tablet metadata, leading to a crash
> since failure to delete metadata is a fatal error. That's what happened in
> the below logs, but it's unclear why the blocks failed to be deleted, and why
> the server stayed up for a couple minutes after before receiving a delete
> tablet request, and ultimately crashing. Following the crash, the server was
> able to start up successfully.
>
> {{I1130 00:00:07.565915 29721 tablet_service.cc:795] Processing DeleteTablet
> for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type
> TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from
> \{username='kudu'} at 10.93.87.15:47194}}
> {{I1130 00:00:07.565929 29721 tablet_replica.cc:262] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: stopping
> tablet replica}}
> {{I1130 00:00:07.565954 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> CompactRowSetsOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.565997 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> MinorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566010 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> MajorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566020 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> UndoDeltaBlockGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566032 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> FlushMRSOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566040 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> FlushDeltaMemStoresOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566048 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> LogGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566056 29721 raft_consensus.cc:2012] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3
> FOLLOWER]: Raft consensus shutting down.}}
> {{I1130 00:00:07.566074 29721 raft_consensus.cc:2039] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3
> FOLLOWER]: Raft consensus is shut down!}}
> {{I1130 00:00:07.666061 29721 ts_tablet_manager.cc:1277] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting
> tablet data with delete state TABLET_DATA_DELETED}}
> {{I1130 00:00:08.102607 29721 ts_tablet_manager.cc:1290] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: tablet
> deleted with delete type TABLET_DATA_DELETED: last-logged OpId 3.1166195}}
> {{I1130 00:00:08.102629 29721 log.cc:981] T 1db7aa7e81474907ace3d493c24cdc94
> P 97235196a93b41c29954ed8534aa2ddc: Deleting WAL directory at
> /home/kudu/tablet/wal/wals/1db7aa7e81474907ace3d493c24cdc94}}
> {{I1130 00:00:08.103217 29721 ts_tablet_manager.cc:1310] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting
> consensus metadata}}
> {{F1130 00:00:08.155643 29721 ts_tablet_manager.cc:848] Failed to delete
> tablet data for 1db7aa7e81474907ace3d493c24cdc94: Invalid argument: Unable to
> delete on-disk data from tablet 1db7aa7e81474907ace3d493c24cdc94: The
> metadata for tablet 1db7aa7e81474907ace3d493c24cdc94 still references
> orphaned blocks. Call DeleteTabletData() first}}
> {{I1130 00:02:09.460352 29725 tablet_service.cc:795] Processing DeleteTablet
> for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type
> TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from
> \{username='kudu'} at 10.93.87.15:47194}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)