[
https://issues.apache.org/jira/browse/FLINK-35217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Richter updated FLINK-35217:
-----------------------------------
Fix Version/s: 1.18.2
1.19.1
> Missing fsync in FileSystemCheckpointStorage
> --------------------------------------------
>
> Key: FLINK-35217
> URL: https://issues.apache.org/jira/browse/FLINK-35217
> Project: Flink
> Issue Type: Bug
> Components: FileSystems, Runtime / Checkpointing
> Affects Versions: 1.17.0, 1.18.0, 1.19.0
> Reporter: Marc Aurel Fritz
> Assignee: Stefan Richter
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> While running Flink on a system with unstable power supply checkpoints were
> regularly corrupted in the form of "_metadata" files with a file size of 0
> bytes. In all cases the previous checkpoint data had already been deleted,
> causing progress to be lost completely.
> Further investigation revealed that the "FileSystemCheckpointStorage" doesn't
> perform "fsync" when writing a new checkpoint to disk. This means the old
> checkpoint gets removed without making sure that the new one is durably
> persisted on disk. "strace" on the jobmanager's process confirms this
> behavior:
> # The checkpoint chk-60's in-progress metadata is written at "openat"
> # The checkpoint chk-60's in-progress metadata is atomically renamed at
> "rename"
> # The old checkpoint chk-59 is deleted at "unlink"
> For durable persistence an "fsync" call is missing before step 3.
> Full "strace" log:
> {code:java}
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60",
> 0x7fd2ad5fc970) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60",
> 0x7fd2ad5fca00) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc",
> {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
> [pid 51618] 11:44:30
> mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777)
> = 0
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
> 0x7fd2ad5fc860) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
> 0x7fd2ad5fc740) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
> 0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30 openat(AT_FDCWD,
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
> O_WRONLY|O_CREAT|O_EXCL, 0666) = 168
> [pid 51618] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
> {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51618] 11:44:30
> rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = > 0
> [pid 51644] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata",
> {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51644] 11:44:30
> unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata")
> = 0
> [pid 51644] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD,
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59",
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
> [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0,
> ...}, AT_EMPTY_PATH) = 0
> [pid 51644] 11:44:30
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD,
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59",
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
> [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0,
> ...}, AT_EMPTY_PATH) = 0
> [pid 51644] 11:44:30
> unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = -1
> EISDIR (Is a directory)
> [pid 51644] 11:44:30
> rmdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = 0
> {code}
> To fix this I'm currently testing the following commit:
> [https://github.com/Planet-X/flink/commit/24196cc897533b654f44e2b612543ff023cdb123]
> "strace" can confirm that "fsync" is now called before the previous
> checkpoint is removed at "unlink":
> {code:java}
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",
> <unfinished ...>
> [pid 40393] 11:30:17 <... stat resumed>0x7fc887efc970) = -1 ENOENT (No such
> file or directory)
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",
> 0x7fc887efca00) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e",
> {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
> [pid 40393] 11:30:17
> mkdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 0777)
> = 0
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata",
> 0x7fc887efc870) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata",
> 0x7fc887efc750) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
> 0x7fc887efc7e0) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 40393] 11:30:17 openat(AT_FDCWD,
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
> O_WRONLY|O_CREAT|O_EXCL, 0666) = 194
> [pid 40393] 11:30:17 fsync(194) = 0
> [pid 40393] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
> {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
> [pid 40393] 11:30:17
> rename("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata") = > 0
> [pid 39230] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata",
> {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
> [pid 39230] 11:30:17
> unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata")
> = 0
> [pid 39230] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17 openat(AT_FDCWD,
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49",
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
> [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0,
> ...}, AT_EMPTY_PATH) = 0
> [pid 39230] 11:30:17
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49",
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17 openat(AT_FDCWD,
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49",
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
> [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0,
> ...}, AT_EMPTY_PATH) = 0
> [pid 39230] 11:30:17
> unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = -1
> EISDIR (Is a directory)
> [pid 39230] 11:30:17
> rmdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = 0
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)