[ 
https://issues.apache.org/jira/browse/FLINK-35217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Richter reassigned FLINK-35217:
--------------------------------------

    Assignee: Stefan Richter

> Missing fsync in FileSystemCheckpointStorage
> --------------------------------------------
>
>                 Key: FLINK-35217
>                 URL: https://issues.apache.org/jira/browse/FLINK-35217
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems, Runtime / Checkpointing
>    Affects Versions: 1.17.0, 1.18.0, 1.19.0
>            Reporter: Marc Aurel Fritz
>            Assignee: Stefan Richter
>            Priority: Critical
>
> While running Flink on a system with unstable power supply checkpoints were 
> regularly corrupted in the form of "_metadata" files with a file size of 0 
> bytes. In all cases the previous checkpoint data had already been deleted, 
> causing progress to be lost completely.
> Further investigation revealed that the "FileSystemCheckpointStorage" doesn't 
> perform "fsync" when writing a new checkpoint to disk. This means the old 
> checkpoint gets removed without making sure that the new one is durably 
> persisted on disk. "strace" on the jobmanager's process confirms this 
> behavior:
>  # The checkpoint chk-60's in-progress metadata is written at "openat"
>  # The checkpoint chk-60's in-progress metadata is atomically renamed at 
> "rename"
>  # The old checkpoint chk-59 is deleted at "unlink"
> For durable persistence an "fsync" call is missing before step 3.
> Full "strace" log:
> {code:java}
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> 0x7fd2ad5fc970) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> 0x7fd2ad5fca00) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc", 
> {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
> [pid 51618] 11:44:30 
> mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777) 
> = 0
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
>  0x7fd2ad5fc860) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata",
>  0x7fd2ad5fc740) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory)
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51618] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  O_WRONLY|O_CREAT|O_EXCL, 0666) = 168
> [pid 51618] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51618] 11:44:30 
> rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
>  "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = > 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata",
>  {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
> [pid 51644] 11:44:30 
> unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata")
>  = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
> [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
> ...}, AT_EMPTY_PATH) = 0
> [pid 51644] 11:44:30 
> stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 51644] 11:44:30 openat(AT_FDCWD, 
> "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
> [pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
> ...}, AT_EMPTY_PATH) = 0
> [pid 51644] 11:44:30 
> unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = -1 
> EISDIR (Is a directory)
> [pid 51644] 11:44:30 
> rmdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = 0 
> {code}
> To fix this I'm currently testing the following commit: 
> [https://github.com/Planet-X/flink/commit/24196cc897533b654f44e2b612543ff023cdb123]
> "strace" can confirm that "fsync" is now called before the previous 
> checkpoint is removed at "unlink":
> {code:java}
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50",  
> <unfinished ...>
> [pid 40393] 11:30:17 <... stat resumed>0x7fc887efc970) = -1 ENOENT (No such 
> file or directory)
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
> 0x7fc887efca00) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e", 
> {st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
> [pid 40393] 11:30:17 
> mkdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 0777) 
> = 0
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata",
>  0x7fc887efc870) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata",
>  0x7fc887efc750) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
>  0x7fc887efc7e0) = -1 ENOENT (No such file or directory)
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 40393] 11:30:17 openat(AT_FDCWD, 
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
>  O_WRONLY|O_CREAT|O_EXCL, 0666) = 194
> [pid 40393] 11:30:17 fsync(194)         = 0
> [pid 40393] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
>  {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
> [pid 40393] 11:30:17 
> rename("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/._metadata.inprogress.24b0ea02-a05c-4297-89ff-08340e8cfa90",
>  "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-50/_metadata") = > 0
> [pid 39230] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata",
>  {st_mode=S_IFREG|0644, st_size=23366, ...}) = 0
> [pid 39230] 11:30:17 
> unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49/_metadata")
>  = 0
> [pid 39230] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17 openat(AT_FDCWD, 
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
> [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, 
> ...}, AT_EMPTY_PATH) = 0
> [pid 39230] 11:30:17 
> stat("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
> {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> [pid 39230] 11:30:17 openat(AT_FDCWD, 
> "/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49", 
> O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 194
> [pid 39230] 11:30:17 newfstatat(194, "", {st_mode=S_IFDIR|0755, st_size=0, 
> ...}, AT_EMPTY_PATH) = 0
> [pid 39230] 11:30:17 
> unlink("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = -1 
> EISDIR (Is a directory)
> [pid 39230] 11:30:17 
> rmdir("/opt/flink/statestore/28be342d7d6b7cfd8883799cab99576e/chk-49") = 0 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to