qinjunjerry commented on a change in pull request #402:
URL: https://github.com/apache/flink-web/pull/402#discussion_r555745655



##########
File path: _posts/2020-12-20-rocksdb.md
##########
@@ -0,0 +1,121 @@
+---
+layout: post
+title:  "Using RocksDB State Backend in Apache Flink: When and How"
+date:   2020-12-20 00:00:00
+authors:
+- Jun Qin:
+  name: "Jun Qin"
+excerpt: This blog post will guide you through the benefits of using RocksDB 
to manage your application’s state, explain when and how to use it and also 
clear up a few common misconceptions.  
+---
+
+[Stream processing](https://en.wikipedia.org/wiki/Stream_processing) 
applications are often stateful, “remembering” information from processed 
events and using it to influence further event processing. In Flink, the 
remembered information, i.e., state, is stored locally in the configured state 
backend. To prevent data loss in case of failures, the state backend 
periodically persists a snapshot of its contents to a pre-configured durable 
storage. The [RocksDB](https://rocksdb.org/) state backend (i.e., 
RocksDBStateBackend) is one of the three built-in state backends in Flink. This 
blog post will guide you through the benefits of using RocksDB to manage your 
application’s state, explain when and how to use it and also clear up a few 
common misconceptions. Having said that, this is **not** a blog post to explain 
how RocksDB works in-depth or how to do advanced troubleshooting and 
performance tuning; if you need help with any of those topics, you can reach 
out to the [Flink User M
 ailing List](https://flink.apache.org/community.html#mailing-lists). 
+
+# State in Flink
+
+To best understand state and state backends in Flink, it’s important to 
distinguish between **in-flight state** and **state snapshots**. In-flight 
state, also known as working state, is the state a Flink job is working on. It 
is always stored locally in memory (with the possibility to spill to disks) and 
can be lost when jobs fail without impacting job recoverability. State 
snapshots, i.e., 
[checkpoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html)
 and 
[savepoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint),
 are stored in a remote durable storage, and are used to restore the local 
state in the case of job failures. The appropriate state backend for a 
production deployment depends on scalability, throughput, and latency 
requirements. 
+
+# What is RocksDB?
+
+Thinking of RocksDB as a distributed database that needs to run on a cluster 
and to be managed by specialized administrators is a common misconception.  
RocksDB is an embeddable persistent key-value store for fast storage. It 
interacts with Flink via the Java Native Interface (JNI). The picture below 
shows where RocksDB fits in a Flink cluster node. Details are explained in the 
following sections.
+
+
+<center>
+<img vspace="8" style="width:60%" 
src="{{site.baseurl}}/img/blog/2020-12-20-rocksdb/RocksDB-in-Flink.png" />
+</center>
+
+
+# RocksDB in Flink
+
+Everything you need to use RocksDB as a state backend is bundled in the Apache 
Flink distribution, including the native shared library:
+
+    $ jar -tvf lib/flink-dist_2.12-1.12.0.jar| grep librocksdbjni-linux64
+    8695334 Wed Nov 27 02:27:06 CET 2019 librocksdbjni-linux64.so
+
+At runtime, RocksDB is embedded in the TaskManager processes. It runs in 
native threads and works with local files. For example, if you have a job 
configured with RocksDBStateBackend running in your Flink cluster, you’ll see 
something similar to the following, where 32513 is the TaskManager process ID.
+
+    $ ps -T -p 32513 | grep -i rocksdb
+    32513 32633 ?        00:00:00 rocksdb:low0
+    32513 32634 ?        00:00:00 rocksdb:high0
+
+<div class="alert alert-info" markdown="1">
+<span class="label label-info" style="display: inline-block"><span 
class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+The command is for Linux only. For other operating systems, please refer to 
their documentation.

Review comment:
       The command does not work on MacOS. It may work on other unix. But I 
never tried. I think it is safe to say "Linux".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to