[GitHub] [flink] curcur commented on a change in pull request #18431: [FLINK-25024][docs] Add Changelog backend docs

GitBox Tue, 25 Jan 2022 09:12:17 -0800


curcur commented on a change in pull request #18431:
URL: https://github.com/apache/flink/pull/18431#discussion_r791311410




##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,126 @@ public class MyOptionsFactory implements 
ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref 
"docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact 
on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore 
end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by 
Asynchronous snapshots

Review comment:
       That's what I am not sure about.
   
   Snapshot creation time: (isn't that just preparing for changelogs to be 
uploaded)? 
   And previously snapshot creat time: flush + prepare SSTs to be uploaded?
   
   I think most of the time reduced is from the async phase (uploading before 
ahead, so the async phase does not take very long time).

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,126 @@ public class MyOptionsFactory implements 
ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref 
"docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact 
on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore 
end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by 
Asynchronous snapshots

Review comment:
       That's what I am not sure about.
   
   Snapshot creation time: (isn't that just preparing for changelogs to be 
uploaded)? 
   And previously snapshot creat time: flush + prepare SSTs to be uploaded?
   
   Would these two parts significantly different?
   
   I think most of the time reduced is from the async phase (uploading before 
ahead, so the async phase does not take very long time).

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,126 @@ public class MyOptionsFactory implements 
ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref 
"docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact 
on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore 
end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by 
Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The latter (upload time) can be decreased by [Incremental checkpoints]({{< ref 
"#incremental-checkpoints" >}}). However,
+even with Incremental checkpoints, large deployments tend to have at least one 
task in every checkpoint that uploads a
+lot of data (e.g. after compaction).

Review comment:
       I am not asking to explain the entire different compaction algorithms, 
but since you mentioned "at least one task in every checkpoint that uploads a 
lot of data (e.g. after compaction)." You need to explain why.
   
   That's what I mean by providing some context: what causes to upload a lot 
data; that's compaction, then why compaction causes more data to upload... e.t.c
   
   I do not think people can infer directly why compaction can cause more data 
to upload until you explain at least a little bit of different level of 
compaction e.t.c. 
   

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,126 @@ public class MyOptionsFactory implements 
ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref 
"docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact 
on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore 
end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by 
Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The latter (upload time) can be decreased by [Incremental checkpoints]({{< ref 
"#incremental-checkpoints" >}}). However,
+even with Incremental checkpoints, large deployments tend to have at least one 
task in every checkpoint that uploads a
+lot of data (e.g. after compaction).
+
+With Changelog enabled, Flink uploads state changes continuously, forming a 
changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state 
backend is checkpointed in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase is reduced, as well as synchronous phase (in 
particular, long-tail).
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the 
`state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, 
recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing 
lower end-to-end latency even in failover
+case.

Review comment:
       Recovery time really depends on how much data to restore.
   
   I also saw a lot of cases that recovery time increased by tens of seconds 
but checkpoint duration does not decrease that much.
   
   What we showed in the graph is a job instance with 196GB state size, and 
there are a lot of cases that the above statement does not hold.
   
   I do not think we should mix recovery time with checkpoint duration here. 
   

##########
File path: docs/content/docs/ops/state/state_backends.md
##########
@@ -325,6 +325,126 @@ public class MyOptionsFactory implements 
ConfigurableRocksDBOptionsFactory {
 
 {{< top >}}
 
+## Enabling Changelog
+
+// todo: Chinese version of all changed docs
+
+// todo: mention in [large state tuning]({{< ref 
"docs/ops/state/large_state_tuning" >}})? or 1.16?
+
+{{< hint warning >}} The feature is in experimental status. {{< /hint >}}
+
+{{< hint warning >}} Enabling Changelog may have a negative performance impact 
on your application (see below). {{< /hint >}}
+
+### Introduction
+
+Changelog is a feature that aims to decrease checkpointing time, and therefore 
end-to-end latency in exactly-once mode.
+
+Most commonly, checkpoint duration is affected by:
+
+1. Barrier travel time and alignment, addressed by
+   [Unaligned checkpoints]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#unaligned-checkpoints" >}})
+   and [Buffer debloating]({{< ref 
"docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}})
+2. Snapshot creation time (so-called synchronous phase), addressed by 
Asynchronous snapshots
+3. Snapshot upload time (asynchronous phase)
+
+The latter (upload time) can be decreased by [Incremental checkpoints]({{< ref 
"#incremental-checkpoints" >}}). However,
+even with Incremental checkpoints, large deployments tend to have at least one 
task in every checkpoint that uploads a
+lot of data (e.g. after compaction).
+
+With Changelog enabled, Flink uploads state changes continuously, forming a 
changelog. On checkpoint, only the relevant
+part of this changelog needs to be uploaded. Independently, configured state 
backend is checkpointed in the
+background periodically. Upon successful upload, changelog is truncated.
+
+As a result, asynchronous phase is reduced, as well as synchronous phase (in 
particular, long-tail).
+
+On the flip side, resource usage is higher:
+
+- more files are created on DFS
+- more IO bandwidth is used to upload
+- more CPU used to serialize state changes
+- more memory used by Task Managers to buffer state changes
+- todo: more details after testing, maybe link to blogpost
+
+Recovery time is another thing to consider. Depending on the 
`state.backend.changelog.periodic-materialize.interval`,
+changelog can become lengthy and replaying it may take more time. However, 
recovery time combined with checkpoint
+duration will likely be still lower than in non-changelog setup, providing 
lower end-to-end latency even in failover
+case.

Review comment:
       Recovery time really depends on how much data to restore, and how fast 
can upload data to dfs.
   
   I also saw a lot of cases that recovery time increased by tens of seconds 
but checkpoint duration does not decrease that much.
   
   What we showed in the graph is a job instance with 196GB state size, and 
there are a lot of cases that the above statement does not hold.
   
   I do not think we should mix recovery time with checkpoint duration here. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] curcur commented on a change in pull request #18431: [FLINK-25024][docs] Add Changelog backend docs

Reply via email to