[
https://issues.apache.org/jira/browse/KUDU-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869722#comment-16869722
]
xiaokai.wang commented on KUDU-2635:
------------------------------------
[~andrew.wong]
latest INFO:
I2019-06-22 00:31:39.572243 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
FlushDeltaMemStoresOp(349276d8ceed418ab3ead23130e6a8d1) complete. Timing: real
0.005s user 0.001s sys 0.001s Metrics:
\{"bytes_written":2669,"delete_count":0,"fdatasync":3,"fdatasync_us":3212,"lbm_write_time_us":40,"lbm_writes_lt_1ms":4,"reinsert_count":0,"update_count":575}
I2019-06-22 00:31:39.687263 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
MajorDeltaCompactionOp(349276d8ceed418ab3ead23130e6a8d1) complete. Timing: real
0.046s user 0.004s sys 0.004s Metrics:
\{"cfile_cache_miss":53,"cfile_cache_miss_bytes":5975,"cfile_init":27,"data
dirs.queue_time_us":177,"data dirs.run_cpu_time_us":46,"data
dirs.run_wall_time_us":47,"delete_count":0,"delta_blocks_compacted":1,"delta_iterators_relevant":1,"fdatasync":53,"fdatasync_us":34747,"lbm_read_time_us":3166,"lbm_reads_lt_1ms":187,"lbm_write_time_us":792,"lbm_writes_lt_1ms":129,"reinsert_count":0,"spinlock_wait_cycles":243712,"thread_start_us":66,"threads_started":2,"update_count":575}
I2019-06-22 00:31:39.831418 38060 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
FlushDeltaMemStoresOp(a61a60ea564c4e2685f1801aa47f0e87) complete. Timing: real
0.074s user 0.001s sys 0.000s Metrics:
\{"bytes_written":518,"delete_count":0,"fdatasync":3,"fdatasync_us":72447,"lbm_write_time_us":30,"lbm_writes_lt_1ms":4,"reinsert_count":0,"update_count":20}
I2019-06-22 00:31:39.899981 38060 tablet.cc:1885] T
3510c5e81df847859d960a46dadce33d P ee9ebbb99eef401ca89d6ca70beeca7b: Creating
new rowset, rs_create_time_sec: 1561134699 I2019-06-22 00:31:39.906716 38055
maintenance_manager.cc:540] P ee9ebbb99eef401ca89d6ca70beeca7b:
CompactRowSetsOp(65e507f021c14e309e7fc30331173944) complete. Timing: real
6.092s user 1.952s sys 0.653s Metrics:
\{"bytes_written":212748092,"cfile_cache_hit":249,"cfile_cache_hit_bytes":9236687,"cfile_cache_miss":249,"cfile_cache_miss_bytes":9236687,"cfile_init":123,"data
dirs.queue_time_us":302,"data dirs.run_cpu_time_us":340,"data
dirs.run_wall_time_us":23388,"delta_iterators_relevant":6,"drs_written":3,"fdatasync":107,"fdatasync_us":145208,"lbm_read_time_us":658912,"lbm_reads_1-10_ms":135,"lbm_reads_lt_1ms":2934,"lbm_write_time_us":267622,"lbm_writes_1-10_ms":1,"lbm_writes_lt_1ms":4021,"num_input_rowsets":3,"rows_written":252084,"thread_start_us":156,"threads_started":3,"wal-append.queue_time_us":67}
I2019-06-22 00:31:39.977393 38055 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
UndoDeltaBlockGCOp(65e507f021c14e309e7fc30331173944) complete. Timing: real
0.001s user 0.000s sys 0.000s Metrics:
\{"cfile_init":3,"lbm_read_time_us":42,"lbm_reads_lt_1ms":12} I2019-06-22
00:31:40.054874 38055 maintenance_manager.cc:540] P
ee9ebbb99eef401ca89d6ca70beeca7b:
UndoDeltaBlockGCOp(d981f2981df749f6975329efe87a13a6) complete. Timing: real
0.006s user 0.001s sys 0.004s Metrics: \{"data dirs.queue_time_us":2808,"data
dirs.run_cpu_time_us":602,"data
dirs.run_wall_time_us":602,"fdatasync":1,"fdatasync_us":205,"spinlock_wait_cycles":2176,"thread_start_us":53,"threads_started":1,"wal-append.queue_time_us":82}
I2019-06-22 00:31:40.124300 38055 log_reader.cc:552] T
d8215bf9d0d14dc19543c4fcaa2bb60a: removed 1 log segments from log reader
I2019-06-22 00:31:40.124320 38055 log.cc:875] T
d8215bf9d0d14dc19543c4fcaa2bb60a P ee9ebbb99eef401ca89d6ca70beeca7b: Deleting
log segment in path:
/home/admin/logs/kudu-wal/wals/d8215bf9d0d14dc19543c4fcaa2bb60a/wal-000000258
(ops 11718943-11772405) I2019-06-22 00:31:40.124456 38055
maintenance_manager.cc:540] P ee9ebbb99eef401ca89d6ca70beeca7b:
LogGCOp(d8215bf9d0d14dc19543c4fcaa2bb60a) complete. Timing: real 0.000s user
0.000s sys 0.000s Metrics: {}
> Tserver crash because some orphaned blocks are still listed when deleting
> metadata
> ----------------------------------------------------------------------------------
>
> Key: KUDU-2635
> URL: https://issues.apache.org/jira/browse/KUDU-2635
> Project: Kudu
> Issue Type: Bug
> Components: fs, tablet, tserver
> Affects Versions: 1.7.0
> Reporter: Andrew Wong
> Assignee: Andrew Wong
> Priority: Major
>
> In some cases, upon deleting a tablet, a tablet server may fail to delete
> some blocks, and then fail to delete the tablet metadata, leading to a crash
> since failure to delete metadata is a fatal error. That's what happened in
> the below logs, but it's unclear why the blocks failed to be deleted, and why
> the server stayed up for a couple minutes after before receiving a delete
> tablet request, and ultimately crashing. Following the crash, the server was
> able to start up successfully.
>
> {{I1130 00:00:07.565915 29721 tablet_service.cc:795] Processing DeleteTablet
> for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type
> TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from
> \{username='kudu'} at 10.93.87.15:47194}}
> {{I1130 00:00:07.565929 29721 tablet_replica.cc:262] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: stopping
> tablet replica}}
> {{I1130 00:00:07.565954 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> CompactRowSetsOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.565997 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> MinorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566010 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> MajorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566020 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> UndoDeltaBlockGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566032 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> FlushMRSOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566040 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> FlushDeltaMemStoresOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566048 29721 maintenance_manager.cc:235] P
> 97235196a93b41c29954ed8534aa2ddc: Unregistered op
> LogGCOp(1db7aa7e81474907ace3d493c24cdc94)}}
> {{I1130 00:00:07.566056 29721 raft_consensus.cc:2012] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3
> FOLLOWER]: Raft consensus shutting down.}}
> {{I1130 00:00:07.566074 29721 raft_consensus.cc:2039] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3
> FOLLOWER]: Raft consensus is shut down!}}
> {{I1130 00:00:07.666061 29721 ts_tablet_manager.cc:1277] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting
> tablet data with delete state TABLET_DATA_DELETED}}
> {{I1130 00:00:08.102607 29721 ts_tablet_manager.cc:1290] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: tablet
> deleted with delete type TABLET_DATA_DELETED: last-logged OpId 3.1166195}}
> {{I1130 00:00:08.102629 29721 log.cc:981] T 1db7aa7e81474907ace3d493c24cdc94
> P 97235196a93b41c29954ed8534aa2ddc: Deleting WAL directory at
> /home/kudu/tablet/wal/wals/1db7aa7e81474907ace3d493c24cdc94}}
> {{I1130 00:00:08.103217 29721 ts_tablet_manager.cc:1310] T
> 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting
> consensus metadata}}
> {{F1130 00:00:08.155643 29721 ts_tablet_manager.cc:848] Failed to delete
> tablet data for 1db7aa7e81474907ace3d493c24cdc94: Invalid argument: Unable to
> delete on-disk data from tablet 1db7aa7e81474907ace3d493c24cdc94: The
> metadata for tablet 1db7aa7e81474907ace3d493c24cdc94 still references
> orphaned blocks. Call DeleteTabletData() first}}
> {{I1130 00:02:09.460352 29725 tablet_service.cc:795] Processing DeleteTablet
> for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type
> TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from
> \{username='kudu'} at 10.93.87.15:47194}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)