This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 95727c4bb1 [Doc](group commit) Modify some group commit doc (#565)
95727c4bb1 is described below
commit 95727c4bb11b2f282e9e3ea8647ade8a62dac6e9
Author: meiyi <[email protected]>
AuthorDate: Wed Apr 17 16:10:47 2024 +0800
[Doc](group commit) Modify some group commit doc (#565)
---
.../import/import-way/group-commit-manual.md | 37 ++++++++++----------
.../import/import-way/group-commit-manual.md | 37 ++++++++++----------
.../import/import-way/group-commit-manual.md | 35 +++++++++----------
.../import/import-way/group-commit-manual.md | 39 ++++++++++------------
4 files changed, 68 insertions(+), 80 deletions(-)
diff --git a/docs/data-operate/import/import-way/group-commit-manual.md
b/docs/data-operate/import/import-way/group-commit-manual.md
index b2c677a8f6..680bddc7be 100644
--- a/docs/data-operate/import/import-way/group-commit-manual.md
+++ b/docs/data-operate/import/import-way/group-commit-manual.md
@@ -42,7 +42,7 @@ Doris groups multiple loads into one transaction commit based
on the `group_comm
* `async_mode`
-Doris writes data to the Write Ahead Log (WAL) firstly, then the load is
returned. Doris groups multiple loads into one transaction commit based on the
`group_commit_interval` table property, and the data is visible after the
commit. To prevent excessive disk space usage by the WAL, it automatically
switches to `sync_mode`. This is suitable for latency-sensitive and
high-frequency writing.
+Doris writes data to the Write Ahead Log (WAL) firstly, then the load is
returned. Doris groups multiple loads into one transaction commit based on the
table property of group commit conditions, and the data is visible after the
commit. To prevent excessive disk space usage by the WAL, it automatically
switches to `sync_mode`. This is suitable for latency-sensitive and
high-frequency writing.
## Basic operations
@@ -62,20 +62,22 @@ PROPERTIES (
### Use `JDBC`
-To reduce the CPU cost of SQL parsing and query planning, we provide the
`PreparedStatement` in the FE. When using `PreparedStatement`, the SQL and its
plan will be cached in the session level memory cache and will be reused later
on, which reduces the CPU cost of FE. The following is an example of using
PreparedStatement in JDBC:
+To reduce the CPU cost of SQL parsing and query planning, we provide the
`PreparedStatement` in the FE. When using `PreparedStatement`, the SQL and its
plan will be cached in the session level memory cache and will be reused later
on, which reduces the CPU cost of FE.
+
+The following is an example of using PreparedStatement in JDBC:
1. Setup JDBC url and enable server side prepared statement
```
-url = jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true
```
-2. Set `group_commit` session variable, there are two ways to do it:
+2. If users need to enable `group_commit` at the same time, set `group_commit`
session variable, there are two ways to do it:
* Add `sessionVariables=group_commit=async_mode` in JDBC url
```
-url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&sessionVariables=group_commit=async_mode
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true&sessionVariables=group_commit=async_mode
```
* Use `SET group_commit = async_mode;` command
@@ -90,7 +92,7 @@ try (Statement statement = conn.createStatement()) {
```java
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
-private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true";
+private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true";
private static final String HOST = "127.0.0.1";
private static final int PORT = 9087;
private static final String DB = "db";
@@ -124,10 +126,9 @@ private static void groupCommitInsert() throws Exception {
private static void groupCommitInsertBatch() throws Exception {
Class.forName(JDBC_DRIVER);
- // add rewriteBatchedStatements=true and cachePrepStmts=true in JDBC url
// set session variables by sessionVariables=group_commit=async_mode in
JDBC url
try (Connection conn = DriverManager.getConnection(
- String.format(URL_PATTERN +
"&rewriteBatchedStatements=true&cachePrepStmts=true&sessionVariables=group_commit=async_mode",
HOST, PORT, DB), USER, PASSWD)) {
+ String.format(URL_PATTERN +
"&sessionVariables=group_commit=async_mode", HOST, PORT, DB), USER, PASSWD)) {
String query = "insert into " + TBL + " values(?, ?, ?)";
try (PreparedStatement stmt = conn.prepareStatement(query)) {
@@ -148,6 +149,8 @@ private static void groupCommitInsertBatch() throws
Exception {
}
```
+The insert statements with group commit can be executed in Master FE or other
FEs. So users can add more FE nodes to improve the write performance.
+
See [Synchronize Data Using Insert Method](../import-scenes/jdbc-load.md) for
more details about **JDBC**.
### INSERT INTO VALUES
@@ -350,11 +353,11 @@ ALTER TABLE dt SET ("group_commit_interval_ms" = "2000");
### Modify the data size condition
-The default group commit data size is 64 MB. Users can modify the
configuration of the table:
+The default group commit data size is 128 MB. Users can modify the
configuration of the table:
```sql
-# Modify the group commit data size to 128MB
-ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");
+# Modify the group commit data size to 256MB
+ALTER TABLE dt SET ("group_commit_data_bytes" = "268435456");
```
## Limitations
@@ -365,12 +368,12 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* Specify the label, such as `INSERT INTO dt WITH LABEL {label} VALUES`
- * Expressions within VALUES, such as `INSERT INTO dt VALUES (1 + 100)`
-
* Column update
* Tables that do not support light schema changes
+ * Expressions within VALUES, such as `INSERT INTO dt VALUES (1 + 100)`
+
* When the group commit is enabled, some `Stream Load` and `Http Stream` are
not executed in the group commit way if they meet the following conditions:
* Two phase commit
@@ -399,13 +402,7 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* When decommissioning a BE node, please use the
[`DECOMMISSION`](../../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND.md)
command to safely decommission the node. This prevents potential data loss if
the WAL files are not processed before the node is taken offline.
- * For async_mode group commit writes, to protect disk space, it switches to
sync_mode under the following conditions:
-
- * For an import with large amount of data: exceeding 80% of the disk space
of a WAL directory.
-
- * Chunked stream loads with an unknown data amount.
-
- * Insufficient disk space, even with it is an import with small amount of
data.
+ * For async_mode group commit writes, to protect disk space, Doris may
return an error of `will not write wal because wal disk space usage reach max
limit` or switch to `sync_mode`. Users can see the WAL data size in
`group_commit_wal_path` directory, and search BE logs to find the reason of
loading failed.
* During hard weight schema changes (adding or dropping columns, modifying
varchar length, and renaming columns are lightweight schema changes, others are
hard weight), to ensure WAL file is compatibility with the table's schema, the
final stage of metadata modification in FE will reject group commit writes.
Clients get `insert table ${table_name} is blocked on schema change` exception
and can retry the import.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
index e7dfd96d83..614716ed04 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/group-commit-manual.md
@@ -42,7 +42,7 @@ Doris 根据负载和表的 `group_commit_interval`属性将多个导入在一
* 异步模式(`async_mode`)
-Doris 首先将数据写入 WAL (`Write Ahead Log`),然后导入立即返回。Doris
会根据负载和表的`group_commit_interval`属性异步提交数据,提交之后数据可见。为了防止 WAL
占用较大的磁盘空间,单次导入数据量较大时,会自动切换为`sync_mode`。这适用于写入延迟敏感以及高频写入的场景。
+Doris 首先将数据写入 WAL (`Write Ahead Log`),然后导入立即返回。Doris
会根据表配置的提交条件异步提交数据,提交之后数据可见。为了防止 WAL
占用较大的磁盘空间,单次导入数据量较大时,会自动切换为`sync_mode`。这适用于写入延迟敏感以及高频写入的场景。
## Group Commit 使用方式
@@ -62,20 +62,22 @@ PROPERTIES (
### 使用`JDBC`
-当用户使用 JDBC `insert into values`方式写入时,为了减少 SQL 解析和生成规划的开销, 我们在 FE 端支持了 MySQL
协议的`PreparedStatement`特性。当使用`PreparedStatement`时,SQL 和其导入规划将被缓存到 Session
级别的内存缓存中,后续的导入直接使用缓存对象,降低了 FE 的 CPU 压力。下面是在 JDBC 中使用 PreparedStatement 的例子:
+当用户使用 JDBC `insert into values`方式写入时,为了减少 SQL 解析和生成规划的开销, 我们在 FE 端支持了 MySQL
协议的`PreparedStatement`特性。当使用`PreparedStatement`时,SQL 和其导入规划将被缓存到 Session
级别的内存缓存中,后续的导入直接使用缓存对象,降低了 FE 的 CPU 压力。`PreparedStatement`可以独立于`GroupCommit`使用。
+
+下面是在 JDBC 中使用 PreparedStatement 的例子:
1. 设置 JDBC url 并在 Server 端开启 prepared statement
```
-url = jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true
```
-2. 配置 `group_commit` session变量,有如下两种方式:
+2. 如果用户需要同时开启`group_commit`,需要配置 `group_commit` session变量,有如下两种方式:
* 通过 JDBC url 设置,增加`sessionVariables=group_commit=async_mode`
```
-url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&sessionVariables=group_commit=async_mode
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true&sessionVariables=group_commit=async_mode
```
* 通过执行 SQL 设置
@@ -90,7 +92,7 @@ try (Statement statement = conn.createStatement()) {
```java
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
-private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true";
+private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true";
private static final String HOST = "127.0.0.1";
private static final int PORT = 9087;
private static final String DB = "db";
@@ -124,10 +126,9 @@ private static void groupCommitInsert() throws Exception {
private static void groupCommitInsertBatch() throws Exception {
Class.forName(JDBC_DRIVER);
- // add rewriteBatchedStatements=true and cachePrepStmts=true in JDBC url
// set session variables by sessionVariables=group_commit=async_mode in
JDBC url
try (Connection conn = DriverManager.getConnection(
- String.format(URL_PATTERN +
"&rewriteBatchedStatements=true&cachePrepStmts=true&sessionVariables=group_commit=async_mode",
HOST, PORT, DB), USER, PASSWD)) {
+ String.format(URL_PATTERN +
"&sessionVariables=group_commit=async_mode", HOST, PORT, DB), USER, PASSWD)) {
String query = "insert into " + TBL + " values(?, ?, ?)";
try (PreparedStatement stmt = conn.prepareStatement(query)) {
@@ -148,6 +149,8 @@ private static void groupCommitInsertBatch() throws
Exception {
}
```
+另外,开启`group_commit`的`insert`语句不仅可以在Master
FE上执行,也可以在其他FE上执行。因此,当FE成为写入瓶颈时,可以水平扩展更多的FE节点来提高写入性能。
+
关于**JDBC**的更多用法,参考[使用Insert方式同步数据](../import-scenes/jdbc-load.md)。
### INSERT INTO VALUES
@@ -349,11 +352,11 @@ ALTER TABLE dt SET ("group_commit_interval_ms" = "2000");
### 修改提交数据量
-group commit 的默认提交数据量为 64 MB,用户可以通过修改表的配置调整:
+group commit 的默认提交数据量为 128 MB,用户可以通过修改表的配置调整:
```sql
-# 修改提交数据量为 128MB
-ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");
+# 修改提交数据量为 256MB
+ALTER TABLE dt SET ("group_commit_data_bytes" = "268435456");
```
## 使用限制
@@ -364,12 +367,12 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
+ 指定 label,即`INSERT INTO dt WITH LABEL {label} VALUES`
- + VALUES 中包含表达式,即`INSERT INTO dt VALUES (1 + 100)`
-
+ 列更新写入
+ 表不支持 light schema change
+ + VALUES 中包含表达式,即`INSERT INTO dt VALUES (1 + 100)`
+
* 当开启了 group commit 模式,系统会判断用户发起的`Stream Load`和`Http Stream`是否符合 group commit
的条件,如果符合,该导入的执行会进入到 group commit 写入中。符合以下条件的会自动退化为非 group commit 方式:
+ 两阶段提交
@@ -398,13 +401,7 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* 当下线 BE
节点时,请使用[`DECOMMISSION`](../../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND.md)命令,安全下线节点,防止该节点下线前
WAL 文件还没有全部处理完成,导致部分数据丢失
- * 对于`async_mode`的 group commit 写入,为了保护磁盘空间,当遇到以下情况时,会切换成`sync_mode`
-
- * 导入数据量过大,即超过 WAL 单目录的80%空间
-
- * 不知道数据量的 chunked stream load
-
- * 导入数据量不大,但磁盘可用空间不足
+ * 对于`async_mode`的 group commit 写入,为了保护磁盘空间,服务端可能会报错`will not write wal
because wal disk space usage reach max
limit`拒绝写入,或切换成`sync_mode`。此时,用户可以排查`group_commit_wal_path`路径下的 WAL 数据量是否过大,并通过
BE 日志查看导入失败的原因
* 当发生重量级 schema change(目前加减列、修改 varchar 长度和重命名列是轻量级 schema change,其它的是重量级
schema change) 时,为了保证 WAL 能够适配表的 schema,在 schema change 最后的 fe 修改元数据阶段,会拒绝
group commit 写入,客户端收到`insert table ${table_name} is blocked on schema
change`异常,客户端重试即可
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
index e7dfd96d83..6a6681822d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
@@ -42,7 +42,7 @@ Doris 根据负载和表的 `group_commit_interval`属性将多个导入在一
* 异步模式(`async_mode`)
-Doris 首先将数据写入 WAL (`Write Ahead Log`),然后导入立即返回。Doris
会根据负载和表的`group_commit_interval`属性异步提交数据,提交之后数据可见。为了防止 WAL
占用较大的磁盘空间,单次导入数据量较大时,会自动切换为`sync_mode`。这适用于写入延迟敏感以及高频写入的场景。
+Doris 首先将数据写入 WAL (`Write Ahead Log`),然后导入立即返回。Doris
会根据表配置的提交条件异步提交数据,提交之后数据可见。为了防止 WAL
占用较大的磁盘空间,单次导入数据量较大时,会自动切换为`sync_mode`。这适用于写入延迟敏感以及高频写入的场景。
## Group Commit 使用方式
@@ -62,15 +62,17 @@ PROPERTIES (
### 使用`JDBC`
-当用户使用 JDBC `insert into values`方式写入时,为了减少 SQL 解析和生成规划的开销, 我们在 FE 端支持了 MySQL
协议的`PreparedStatement`特性。当使用`PreparedStatement`时,SQL 和其导入规划将被缓存到 Session
级别的内存缓存中,后续的导入直接使用缓存对象,降低了 FE 的 CPU 压力。下面是在 JDBC 中使用 PreparedStatement 的例子:
+当用户使用 JDBC `insert into values`方式写入时,为了减少 SQL 解析和生成规划的开销, 我们在 FE 端支持了 MySQL
协议的`PreparedStatement`特性。当使用`PreparedStatement`时,SQL 和其导入规划将被缓存到 Session
级别的内存缓存中,后续的导入直接使用缓存对象,降低了 FE 的 CPU 压力。`PreparedStatement`可以独立于`GroupCommit`使用。
+
+下面是在 JDBC 中使用 PreparedStatement 的例子:
1. 设置 JDBC url 并在 Server 端开启 prepared statement
```
-url = jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true
```
-2. 配置 `group_commit` session变量,有如下两种方式:
+2. 如果用户需要同时开启`group_commit`,需要配置 `group_commit` session变量,有如下两种方式:
* 通过 JDBC url 设置,增加`sessionVariables=group_commit=async_mode`
@@ -90,7 +92,7 @@ try (Statement statement = conn.createStatement()) {
```java
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
-private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true";
+private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true";
private static final String HOST = "127.0.0.1";
private static final int PORT = 9087;
private static final String DB = "db";
@@ -124,10 +126,9 @@ private static void groupCommitInsert() throws Exception {
private static void groupCommitInsertBatch() throws Exception {
Class.forName(JDBC_DRIVER);
- // add rewriteBatchedStatements=true and cachePrepStmts=true in JDBC url
// set session variables by sessionVariables=group_commit=async_mode in
JDBC url
try (Connection conn = DriverManager.getConnection(
- String.format(URL_PATTERN +
"&rewriteBatchedStatements=true&cachePrepStmts=true&sessionVariables=group_commit=async_mode",
HOST, PORT, DB), USER, PASSWD)) {
+ String.format(URL_PATTERN +
"&sessionVariables=group_commit=async_mode", HOST, PORT, DB), USER, PASSWD)) {
String query = "insert into " + TBL + " values(?, ?, ?)";
try (PreparedStatement stmt = conn.prepareStatement(query)) {
@@ -148,6 +149,8 @@ private static void groupCommitInsertBatch() throws
Exception {
}
```
+另外,开启`group_commit`的`insert`语句不仅可以在Master
FE上执行,也可以在其他FE上执行。因此,当FE成为写入瓶颈时,可以水平扩展更多的FE节点来提高写入性能。
+
关于**JDBC**的更多用法,参考[使用Insert方式同步数据](../import-scenes/jdbc-load.md)。
### INSERT INTO VALUES
@@ -349,11 +352,11 @@ ALTER TABLE dt SET ("group_commit_interval_ms" = "2000");
### 修改提交数据量
-group commit 的默认提交数据量为 64 MB,用户可以通过修改表的配置调整:
+group commit 的默认提交数据量为 128 MB,用户可以通过修改表的配置调整:
```sql
-# 修改提交数据量为 128MB
-ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");
+# 修改提交数据量为 256MB
+ALTER TABLE dt SET ("group_commit_data_bytes" = "268435456");
```
## 使用限制
@@ -364,12 +367,12 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
+ 指定 label,即`INSERT INTO dt WITH LABEL {label} VALUES`
- + VALUES 中包含表达式,即`INSERT INTO dt VALUES (1 + 100)`
-
+ 列更新写入
+ 表不支持 light schema change
+ + VALUES 中包含表达式,即`INSERT INTO dt VALUES (1 + 100)`
+
* 当开启了 group commit 模式,系统会判断用户发起的`Stream Load`和`Http Stream`是否符合 group commit
的条件,如果符合,该导入的执行会进入到 group commit 写入中。符合以下条件的会自动退化为非 group commit 方式:
+ 两阶段提交
@@ -398,13 +401,7 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* 当下线 BE
节点时,请使用[`DECOMMISSION`](../../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND.md)命令,安全下线节点,防止该节点下线前
WAL 文件还没有全部处理完成,导致部分数据丢失
- * 对于`async_mode`的 group commit 写入,为了保护磁盘空间,当遇到以下情况时,会切换成`sync_mode`
-
- * 导入数据量过大,即超过 WAL 单目录的80%空间
-
- * 不知道数据量的 chunked stream load
-
- * 导入数据量不大,但磁盘可用空间不足
+ * 对于`async_mode`的 group commit 写入,为了保护磁盘空间,服务端可能会报错`will not write wal
because wal disk space usage reach max
limit`拒绝写入,或切换成`sync_mode`。此时,用户可以排查`group_commit_wal_path`路径下的 WAL 数据量是否过大,并通过
BE 日志查看导入失败的原因
* 当发生重量级 schema change(目前加减列、修改 varchar 长度和重命名列是轻量级 schema change,其它的是重量级
schema change) 时,为了保证 WAL 能够适配表的 schema,在 schema change 最后的 fe 修改元数据阶段,会拒绝
group commit 写入,客户端收到`insert table ${table_name} is blocked on schema
change`异常,客户端重试即可
diff --git
a/versioned_docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
b/versioned_docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
index b2c677a8f6..b0214dc511 100644
---
a/versioned_docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
+++
b/versioned_docs/version-2.1/data-operate/import/import-way/group-commit-manual.md
@@ -26,7 +26,7 @@ under the License.
# Group Commit
-Group commit load does not introduce a new data import method, but an
extension of `INSERT INTO tbl VALUS(...)`, `Stream Load` and `Http Stream`. It
is a way to improve the write performance of Doris with high-concurrency and
small-data writes. Your application can directly use JDBC to do
high-concurrency insert operation into Doris, at the same time, combining
PreparedStatement can get even higher performance. In logging scenarios, you
can also do high-concurrency Stream Load or Http St [...]
+Group commit load does not introduce a new data import method, but an
extension of `INSERT INTO tbl VALUS(...)`, `Stream Load` and `Http Stream`. It
is a way to improve the write performance of Doris with high-concurrency and
small-data writes. Your application can directly use JDBC to do
high-concurrency insert operation into Doris, at the same time, combining
PreparedStatement can get even higher performance. In logging scenarios, you
can also do high-concurrency Stream Load or Http St [...]
## Group Commit Mode
@@ -42,7 +42,7 @@ Doris groups multiple loads into one transaction commit based
on the `group_comm
* `async_mode`
-Doris writes data to the Write Ahead Log (WAL) firstly, then the load is
returned. Doris groups multiple loads into one transaction commit based on the
`group_commit_interval` table property, and the data is visible after the
commit. To prevent excessive disk space usage by the WAL, it automatically
switches to `sync_mode`. This is suitable for latency-sensitive and
high-frequency writing.
+Doris writes data to the Write Ahead Log (WAL) firstly, then the load is
returned. Doris groups multiple loads into one transaction commit based on the
table property of group commit conditions, and the data is visible after the
commit. To prevent excessive disk space usage by the WAL, it automatically
switches to `sync_mode`. This is suitable for latency-sensitive and
high-frequency writing.
## Basic operations
@@ -62,20 +62,22 @@ PROPERTIES (
### Use `JDBC`
-To reduce the CPU cost of SQL parsing and query planning, we provide the
`PreparedStatement` in the FE. When using `PreparedStatement`, the SQL and its
plan will be cached in the session level memory cache and will be reused later
on, which reduces the CPU cost of FE. The following is an example of using
PreparedStatement in JDBC:
+To reduce the CPU cost of SQL parsing and query planning, we provide the
`PreparedStatement` in the FE. When using `PreparedStatement`, the SQL and its
plan will be cached in the session level memory cache and will be reused later
on, which reduces the CPU cost of FE.
+
+The following is an example of using PreparedStatement in JDBC:
1. Setup JDBC url and enable server side prepared statement
```
-url = jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true
```
-2. Set `group_commit` session variable, there are two ways to do it:
+2. If users need to enable `group_commit` at the same time, set `group_commit`
session variable, there are two ways to do it:
* Add `sessionVariables=group_commit=async_mode` in JDBC url
```
-url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&sessionVariables=group_commit=async_mode
+url =
jdbc:mysql://127.0.0.1:9030/db?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true&sessionVariables=group_commit=async_mode
```
* Use `SET group_commit = async_mode;` command
@@ -90,7 +92,7 @@ try (Statement statement = conn.createStatement()) {
```java
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
-private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true";
+private static final String URL_PATTERN =
"jdbc:mysql://%s:%d/%s?useServerPrepStmts=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=100&useLocalSessionState=true";
private static final String HOST = "127.0.0.1";
private static final int PORT = 9087;
private static final String DB = "db";
@@ -124,10 +126,9 @@ private static void groupCommitInsert() throws Exception {
private static void groupCommitInsertBatch() throws Exception {
Class.forName(JDBC_DRIVER);
- // add rewriteBatchedStatements=true and cachePrepStmts=true in JDBC url
// set session variables by sessionVariables=group_commit=async_mode in
JDBC url
try (Connection conn = DriverManager.getConnection(
- String.format(URL_PATTERN +
"&rewriteBatchedStatements=true&cachePrepStmts=true&sessionVariables=group_commit=async_mode",
HOST, PORT, DB), USER, PASSWD)) {
+ String.format(URL_PATTERN +
"&sessionVariables=group_commit=async_mode", HOST, PORT, DB), USER, PASSWD)) {
String query = "insert into " + TBL + " values(?, ?, ?)";
try (PreparedStatement stmt = conn.prepareStatement(query)) {
@@ -148,6 +149,8 @@ private static void groupCommitInsertBatch() throws
Exception {
}
```
+The insert statements with group commit can be executed in Master FE or other
FEs. So users can add more FE nodes to improve the write performance.
+
See [Synchronize Data Using Insert Method](../import-scenes/jdbc-load.md) for
more details about **JDBC**.
### INSERT INTO VALUES
@@ -350,11 +353,11 @@ ALTER TABLE dt SET ("group_commit_interval_ms" = "2000");
### Modify the data size condition
-The default group commit data size is 64 MB. Users can modify the
configuration of the table:
+The default group commit data size is 128 MB. Users can modify the
configuration of the table:
```sql
-# Modify the group commit data size to 128MB
-ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");
+# Modify the group commit data size to 256MB
+ALTER TABLE dt SET ("group_commit_data_bytes" = "268435456");
```
## Limitations
@@ -365,12 +368,12 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* Specify the label, such as `INSERT INTO dt WITH LABEL {label} VALUES`
- * Expressions within VALUES, such as `INSERT INTO dt VALUES (1 + 100)`
-
* Column update
* Tables that do not support light schema changes
+ * Expressions within VALUES, such as `INSERT INTO dt VALUES (1 + 100)`
+
* When the group commit is enabled, some `Stream Load` and `Http Stream` are
not executed in the group commit way if they meet the following conditions:
* Two phase commit
@@ -399,13 +402,7 @@ ALTER TABLE dt SET ("group_commit_data_bytes" =
"134217728");
* When decommissioning a BE node, please use the
[`DECOMMISSION`](../../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND.md)
command to safely decommission the node. This prevents potential data loss if
the WAL files are not processed before the node is taken offline.
- * For async_mode group commit writes, to protect disk space, it switches to
sync_mode under the following conditions:
-
- * For an import with large amount of data: exceeding 80% of the disk space
of a WAL directory.
-
- * Chunked stream loads with an unknown data amount.
-
- * Insufficient disk space, even with it is an import with small amount of
data.
+ * For async_mode group commit writes, to protect disk space, Doris may
return an error of `will not write wal because wal disk space usage reach max
limit` or switch to `sync_mode`. Users can see the WAL data size in
`group_commit_wal_path` directory, and search BE logs to find the reason of
loading failed.
* During hard weight schema changes (adding or dropping columns, modifying
varchar length, and renaming columns are lightweight schema changes, others are
hard weight), to ensure WAL file is compatibility with the table's schema, the
final stage of metadata modification in FE will reject group commit writes.
Clients get `insert table ${table_name} is blocked on schema change` exception
and can retry the import.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]