[
https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783767#comment-17783767
]
ASF subversion and git services commented on KUDU-2195:
-------------------------------------------------------
Commit 13a66ea9b088eec1de74249b738cc74333eefc4a in kudu's branch
refs/heads/master from Attila Bukor
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=13a66ea9b ]
[tools] KUDU-3337 Add unsafe_create_cmeta tool
We've seen some cases when a power outage on XFS lead to empty cmeta
files, causing some tablets to fail to start (KUDU-2195). There is a
flag to force fsync, but it's disabled by default except for XFS.
Fortunately, it's possible to reconstruct how a cmeta should look like
based on the information found in ksck (peers) and WAL dumps (term and
config index). Still, the only way to actually create a cmeta file even
if this information is available, was to copy an existing cmeta file and
run "kudu pbc edit" on it, which is very error-prone and hard to
automate.
This commit introduces a new unsafe_create_cmeta tool under
local_replica, which creates a new cmeta file based on the term, config
index and peers as provided in CLI arguments.
I manually tested this tool by using it to recover a tablet with three
empty cmeta files.
Change-Id: I136cc5b5797420a9ca9156f37c3e281da0c265d7
Reviewed-on: http://gerrit.cloudera.org:8080/18029
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
> Enforce durability happened before relationships on multiple disks
> ------------------------------------------------------------------
>
> Key: KUDU-2195
> URL: https://issues.apache.org/jira/browse/KUDU-2195
> Project: Kudu
> Issue Type: Bug
> Components: consensus, tablet
> Affects Versions: 1.9.0
> Reporter: David Alves
> Priority: Major
>
> When using weaker durability semantics (e.g. when log_force_fsync is off) we
> should still enforce certain happened before relationships which are not
> currently being enforced when using different disks for the wal and data.
> The two cases that come to mind where this is relevant are:
> 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for
> instance on term change) with the intention that either {}, \{c} or \{c, w}
> were made durable.
> 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to
> make sure that that all commit messages that refer to on disk row sets (and
> deltas) are on disk before the row sets they point to, i.e. with the
> intention that either {}, \{w} or \{w, t} were made durable.
> With strong durability semantics these are always made durable in the right
> order. With weaker semantics that is not the case though. If using the same
> disk for both the wal and data then the invariants are still preserved, as
> buffers will be flushed in the right order but if using different disks for
> the wal and data (and because cmeta is stored with the data) that is not
> always the case.
> 1) in ext4 is actually safe, because we perform an fsync (indirect, rename()
> implies fsync in ext4) when flushing cmeta. But it is not for xfs.
> 2) Is not safe in either filesystem.
> --- Possible solutions --
> For 1): Store cmeta with the wal; actually always fsync cmeta.
> For 2): Store tablet meta with the wal; always fsync the wal before flushing
> tablet meta.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)