This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 9c609e295 [doc] Add Concurrency Control
9c609e295 is described below
commit 9c609e2956a38595ec4a45f377324218df281441
Author: Jingsong <[email protected]>
AuthorDate: Sun Apr 7 16:43:16 2024 +0800
[doc] Add Concurrency Control
---
docs/content/concepts/concurrency-control.md | 66 +++++++++++++++++++++++++++
docs/static/img/files-conflict.png | Bin 0 -> 297501 bytes
docs/static/img/snapshot-conflict.png | Bin 0 -> 260524 bytes
3 files changed, 66 insertions(+)
diff --git a/docs/content/concepts/concurrency-control.md
b/docs/content/concepts/concurrency-control.md
new file mode 100644
index 000000000..71a7c31dc
--- /dev/null
+++ b/docs/content/concepts/concurrency-control.md
@@ -0,0 +1,66 @@
+---
+title: "Concurrency Control"
+weight: 3
+type: docs
+aliases:
+- /concepts/concurrency-control.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Concurrency Control
+
+Paimon supports optimistic concurrency for multiple concurrent write jobs.
+
+Each job writes data at its own pace and generates a new snapshot based on the
current snapshot by applying incremental
+files (deleting or adding files) at the time of committing.
+
+There may be two types of commit failures here:
+1. Snapshot conflict: the snapshot id has been preempted, the table has
generated a new snapshot from another job. OK, let's commit again.
+2. Files conflict: The file that this job wants to delete has been deleted by
another jobs. At this point, the job can only fail. (For streaming jobs, it
will fail and restart, intentionally failover once)
+
+## Snapshot conflict
+
+Paimon's snapshot ID is unique, so as long as the job writes its snapshot file
to the file system, it is considered successful.
+
+{{< img src="/img/snapshot-conflict.png">}}
+
+Paimon uses the file system's renaming mechanism to commit snapshots, which is
secure for HDFS as it ensures
+transactional and atomic renaming.
+
+But for object storage such as OSS and S3, their `'RENAME'` does not have
atomic semantic. We need to configure Hive or
+jdbc metastore and enable `'lock.enabled'` option for the catalog. Otherwise,
there may be a chance of losing the snapshot.
+
+## Files conflict
+
+When Paimon commits a file deletion (which is only a logical deletion), it
checks for conflicts with the latest snapshot.
+If there are conflicts (which means the file has been logically deleted), it
can no longer continue on this commit node,
+so it can only intentionally trigger a failover to restart, and the job will
retrieve the latest status from the filesystem
+in the hope of resolving this conflict.
+
+{{< img src="/img/files-conflict.png">}}
+
+Paimon will ensure that there is no data loss or duplication here, but if two
streaming jobs are writing at the same
+time and there are conflicts, you will see that they are constantly
restarting, which is not a good thing.
+
+The essence of conflict lies in deleting files (logically), and deleting files
is born from compaction, so as long as
+we close the compaction of the writing job (Set 'write-only' to true) and
start a separate job to do the compaction work,
+everything is very good.
+
+See [dedicated compaction job]({{< ref
"maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
diff --git a/docs/static/img/files-conflict.png
b/docs/static/img/files-conflict.png
new file mode 100644
index 000000000..3ba4eab62
Binary files /dev/null and b/docs/static/img/files-conflict.png differ
diff --git a/docs/static/img/snapshot-conflict.png
b/docs/static/img/snapshot-conflict.png
new file mode 100644
index 000000000..8867280ef
Binary files /dev/null and b/docs/static/img/snapshot-conflict.png differ