davidradl commented on code in PR #26107:
URL: https://github.com/apache/flink/pull/26107#discussion_r1944919887
##########
docs/content/docs/ops/state/state_backends.md:
##########
@@ -95,12 +95,50 @@ Certain RocksDB native metrics are available but disabled
by default, you can fi
The total memory amount of RocksDB instance(s) per slot can also be bounded,
please refer to documentation [here]({{< ref
"docs/ops/state/large_state_tuning" >}}#bounding-rocksdb-memory-usage) for
details.
+## The ForStStateBackend
+
+The *ForStStateBackend* is a state backend that is based on [ForSt
project](https://github.com/ververica/ForSt),
+which is also a LSM-tree structured key-value store and built on top of the
RocksDB.
+It is designed to provide a more efficient way to store and access state in
Flink applications.
+Most importantly, it can hold its sst files on remote file systems that Flink
supports, such as HDFS, S3, etc.
+This allows Flink to scale the state size beyond the local disk capacity of
the TaskManager.
+Moreover, by putting the sst files on remote file systems, it can also provide
a more lightweight
+way to perform checkpoint and recovery.
+
+The ForStStateBackend is still in the experimental stage and is not fully
available for production.
+It always performs asynchronous incremental snapshots.
+
+The ForStStateBackend is encouraged for:
+
+- Jobs with very large state, long windows, large key/value states. Local disk
may not be enough to
+store the state.
+- All high-availability setups.
+- Asynchronous state access is preferred. Since the ForStStateBackend is the
only one supporting
+asynchronous state access.
+- Jobs that require lightweight checkpoint and recovery, such as cloud-native
applications.
+
+Limitations of the ForStStateBackend (for now):
+
+- Same as EmbeddedRocksDBStateBackend, the maximum supported size per key and
per value is 2^31 bytes each.
+- Does not support canonical savepoint, full snapshot, changelog and
file-merging checkpoints.
+Always perform incremental snapshots.
+
+Compared with EmbeddedRocksDBStateBackend, ForStStateBackend stores data on
remote file system, thus
+the amount of state that you can keep is unlimited. The local disk of
TaskManager is only used to
+store cache of file, to provide better performance. Note that when most of the
active state is on
+remote file system, the performance of state access may be affected by the
network latency. Flink
+introduces asynchronous state access to mitigate this issue. If you are using
the asynchronous state
+methods in State API V2, you can benefit from the asynchronous state access.
To get familiar with the
+State API V2, please refer to the [State API V2 documentation]({{< ref
"docs/dev/fault-tolerance/state_v2" >}}).
+
## Choose The Right State Backend
When deciding between `HashMapStateBackend` and `RocksDB`, it is a choice
between performance and scalability.
`HashMapStateBackend` is very fast as each state access and update operates on
objects on the Java heap; however, state size is limited by available memory
within the cluster.
On the other hand, `RocksDB` can scale based on available disk space and is
the only state backend to support incremental snapshots.
However, each state access and update requires (de-)serialization and
potentially reading from disk which leads to average performance that is an
order of magnitude slower than the memory state backends.
+If you are handling very large state even exceed the local disk capacity of
the TaskManager,
Review Comment:
nit: even exceed - > exceeding
I am not sure what `local disk capacity` of the TaskManager means. It
implies there is a disk capacity associated with each task manager.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]