[ 
https://issues.apache.org/jira/browse/KUDU-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804909#comment-17804909
 ] 

Abhishek Chennaka commented on KUDU-3471:
-----------------------------------------

Further working on this issue I see that we do force fsyncing in the current 
version of writing data to the tablet metadata files in Kudu as in 
[here|https://github.com/apache/kudu/blob/master/src/kudu/tablet/tablet_metadata.cc#L710].
 This would mean the tablet metadata file present was somehow corrupted 
resulting in an invalid entry. The fact that this happened across multiple 
replicas across multiple servers is concerning. Poking around the issue more I 
see that the last delta mem store id in the commit message was 0 [1] but the 
last durable commit id was -1 in the tablet_meta [2]

[1]
{code:java}
compression_codec: LZ4
COMMIT 17.1455
        op_type: WRITE_OP commited_op_id { term: 17 index: 1455 } result { ops 
{ skip_on_replay: true mutated_stores { rs_id: 6 dms_id: 0 }
{code}
[2]
{code:java}
TabletMetadata: table_id: "b9166fe1ec624efea24521269bc25125" tablet_id: 
"5d3f10a0427745c7abdc889dae6f62b0" last_durable_mrs_id: 17 rowsets { id: 6 
last_durable_dms_id: -1 
{code}
We update the last durable delta memstore id in the table meta during a delta 
memstore flush or delta store compaction. The only way this could happen is for 
some reason the flushing or compaction of the delta memstore failed/didn't 
occur but the log was still marked for GC collection. The limited amount of 
data available for the issue reproduction limits any meaningful progress in 
this issue but if the issue repeats, collecting all of the tablet metadata, 
wals and all the available logs.

> Enforce flushing of tablet-meta
> -------------------------------
>
>                 Key: KUDU-3471
>                 URL: https://issues.apache.org/jira/browse/KUDU-3471
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Abhishek Chennaka
>            Priority: Major
>
> We suspect tablet-meta was not updated which lead to tablet not being able to 
> startup. Below is the log analysis:
> 1. There was a restart of the cluster which was done on Dec 2 and the tablet 
> 5d3f10a0427745c7abdc889dae6f62b0 bootstrapped successfully. The last known 
> committed index was logged as 1455:
> {code:java}
> Last known committed idx: 1455
> {code}
> 2. The WAL segment containing the ops 1411-1455 was GC'd which indicates this 
> was persisted in the data disk of the server.
> {code:java}
> I1202 12:15:03.523751  9026 log.cc:1068] T 5d3f10a0427745c7abdc889dae6f62b0 P 
> b395541607c54801955a6b5ed310e67c: Deleting log segment in path: 
> /data/kudu/0/wals/5d3f10a0427745c7abdc889dae6f62b0/wal-000000001 (ops 
> 1411-1455).
> {code}
> There were Flushes which happened between Dec 02 and Jan 27:
> {code:java}
> I0116 11:40:33.197038  9463 maintenance_manager.cc:382] P 
> b395541607c54801955a6b5ed310e67c: Scheduling 
> FlushMRSOp(5d3f10a0427745c7abdc889dae6f62b0): perf score=1.000000
> I0116 11:42:33.589488  9463 maintenance_manager.cc:382] P 
> b395541607c54801955a6b5ed310e67c: Scheduling 
> FlushMRSOp(5d3f10a0427745c7abdc889dae6f62b0): perf score=0.033403
> I0125 15:26:57.443202  9463 maintenance_manager.cc:382] P 
> b395541607c54801955a6b5ed310e67c: Scheduling 
> FlushMRSOp(5d3f10a0427745c7abdc889dae6f62b0): perf score=1.000000
> I0125 15:28:57.865049  9463 maintenance_manager.cc:382] P 
> b395541607c54801955a6b5ed310e67c: Scheduling 
> FlushMRSOp(5d3f10a0427745c7abdc889dae6f62b0): perf score=0.033412
> {code}
> 3. As a part of Tablet::DoMergeCompactionOrFlush() we update the 
> TabletMetadata during every flush.
> [https://github.com/apache/kudu/blob/master/src/kudu/tablet/tablet.cc#L2205]
> All of this is to say there were multiple attempts to update the Tablet 
> Metadata after the WAL segment was GC'd on Dec 2.
> 4. When the tablet server was restarted on Jan 27, as a part of the tablet 
> bootstrap, we refer to Tablet Metadata to fetch the last flushed rowset id 
> (last_durable_mrs_id) to the data disk when replaying the WAL segments. This 
> seems to be referring to mrs id with index less than 1455 which should have 
> been flushed and don't need to be replayed. Since the WAL segment was GC'd we 
> ended up in the tablet stopped state.
> {code:java}
> CommitMsg was orphaned but it referred to stores which need replay. Commit: 
> op_type: WRITE_OP commited_op_id { term: 17 index: 1455 }
> {code}
> [https://github.com/apache/kudu/blob/master/src/kudu/tablet/tablet_bootstrap.cc#L1072]
> The tablet-meta of the affected tablets could not be collected unfortunately 
> but the only possible explanation of the above is if the metadata of the 
> tablet is not updated.
> Having some sort of force fsycing on tablet-meta files similar to cmeta 
> should help prevent such scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to