[jira] [Commented] (KUDU-2195) Enforce durability happened before relationships on multiple disks

Adar Dembo (JIRA) Wed, 24 Jul 2019 17:58:52 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892320#comment-16892320
 ]


Adar Dembo commented on KUDU-2195:
----------------------------------

bq. We've seen 1) manifest as the wal having entries of a term that is higher 
than the term in consensus metadata. On boot the tablet server verifies this 
invariant and causes the replica of the tablet to fail to boot. Even though the 
reported status is Status::Corruption the underlying data was not corrupted and 
this replica isn't going to be working anyway. If there are other replicas 
available the tablet will be available and working.

Here's a concrete example of this error:
{noformat}
Committed config change op in WAL has opid index (11) greater than config 
persisted in the consensus metadata (10). Replicate message: {id { term: 6 
index: 11 } timestamp: 6400733184053735424 op_type: CHANGE_CONFIG_OP 
change_config_record { tablet_id: "8b783bfc25094a349d2275147173f9f5" old_config 
{ opid_index: 10 OBSOLETE_local: false peers { permanent_uuid: 
"40eb17f8d3c24198afc774cb6abb11e1" member_type: VOTER last_known_addr { host: 
"79.a.com" port: 7050 } } peers { permanent_uuid: 
"4f4c5c7c93a947fdaddf4dc8f716f0ee" member_type: VOTER last_known_addr { host: 
"78.a.com" port: 7050 } } } new_config { opid_index: 11 OBSOLETE_local: false 
peers { permanent_uuid: "40eb17f8d3c24198afc774cb6abb11e1" member_type: VOTER 
last_known_addr { host: "79.a.com" port: 7050 } } peers { permanent_uuid: 
"4f4c5c7c93a947fdaddf4dc8f716f0ee" member_type: VOTER last_known_addr { host: 
"78.a.com" port: 7050 } } peers { permanent_uuid: 
"aa666cff1b8342d3b10d5c53e4643758" member_type: NON_VOTER last_known_addr { 
host: "27.a.com" port: 7050 } attrs { promote: true } } } }}. Committed raft 
config in consensus metadata: {opid_index: 10 OBSOLETE_local: false peers { 
permanent_uuid: "40eb17f8d3c24198afc774cb6abb11e1" member_type: VOTER 
last_known_addr { host: "79.a.com" port: 7050 } } peers { permanent_uuid: 
"4f4c5c7c93a947fdaddf4dc8f716f0ee" member_type: VOTER last_known_addr { host: 
"78.a.com" port: 7050 } }}
{noformat}

> Enforce durability happened before relationships on multiple disks
> ------------------------------------------------------------------
>
>                 Key: KUDU-2195
>                 URL: https://issues.apache.org/jira/browse/KUDU-2195
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tablet
>            Reporter: David Alves
>            Priority: Major
>             Fix For: 1.9.0
>
>
> When using weaker durability semantics (e.g. when log_force_fsync is off) we 
> should still enforce certain happened before relationships which are not 
> currently being enforced when using different disks for the wal and data.
> The two cases that come to mind where this is relevant are:
> 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for 
> instance on term change) with the intention that either {}, \{c} or \{c, w} 
> were made durable.
> 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to 
> make sure that that all commit messages that refer to on disk row sets (and 
> deltas) are on disk before the row sets they point to, i.e. with the 
> intention that either {}, \{w} or \{w, t} were made durable.
> With strong durability semantics these are always made durable in the right 
> order. With weaker semantics that is not the case though. If using the same 
> disk for both the wal and data then the invariants are  still preserved, as 
> buffers will be flushed in the right order but if using different disks for 
> the wal and data (and because cmeta is stored with the data) that is not 
> always the case.
> 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() 
> implies fsync in ext4) when flushing cmeta. But it is not for xfs.
> 2) Is not safe in either filesystem.
> --- Possible solutions --
> For 1): Store cmeta with the wal; actually always fsync cmeta.
> For 2): Store tablet meta with the wal; always fsync the wal before flushing 
> tablet meta.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (KUDU-2195) Enforce durability happened before relationships on multiple disks

Reply via email to