This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 6496d221391e [docs](update) EN Docs Update (#499)
6496d221391e is described below
commit 6496d221391e9521e586cbd6f81abc7d063f3b17
Author: KassieZ <[email protected]>
AuthorDate: Mon Apr 1 16:21:03 2024 +0800
[docs](update) EN Docs Update (#499)
---
.../version-2.0.json | 2 +-
.../admin-manual/cluster-management/upgrade.md | 30 --
.../admin-manual/privilege-ldap/apache-ranger.md | 2 +-
.../admin-manual/privilege-ldap/certificate.md | 4 +-
.../cluster-management/load-balancing.md | 18 +-
.../admin-manual/cluster-management/upgrade.md | 104 ++---
.../admin-manual/privilege-ldap/apache-ranger.md | 2 +-
.../admin-manual/privilege-ldap/certificate.md | 12 +-
.../admin-manual/privilege-ldap/fe-certificate.md | 2 +-
.../admin-manual/resource-admin/multi-tenant.md | 57 +--
.../data-operate/delete/batch-delete-manual.md | 428 ++++++++++++---------
.../data-operate/delete/delete-manual.md | 285 ++++++++++----
.../data-operate/delete/table-temp-partition.md | 297 +++++++-------
.../data-operate/export/export-manual.md | 67 +++-
.../data-operate/export/export-with-mysql-dump.md | 18 +-
.../version-2.0/data-operate/export/outfile.md | 27 +-
.../source-install/compilation-with-docker.md | 3 +-
.../compilation-with-ldb-toolchain.md | 4 +-
versioned_docs/version-2.0/lakehouse/file.md | 17 +-
.../query/query-data/encryption-function.md | 2 +-
.../query/udf/remote-user-defined-function.md | 136 ++++---
versioned_sidebars/version-2.0-sidebars.json | 10 +-
22 files changed, 868 insertions(+), 659 deletions(-)
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0.json
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0.json
index 144b26708a9a..9592934d8d59 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0.json
@@ -95,7 +95,7 @@
"message": "查询缓存",
"description": "The label for category Pipeline in sidebar docs"
},
- "sidebar.docs.category.Create View and Materialize View": {
+ "sidebar.docs.category.View and Materialize View": {
"message": "视图与物化视图",
"description": "The label for category View and Materialize View in
sidebar docs"
},
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/upgrade.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/upgrade.md
index 17298773f0d2..16876407143b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/upgrade.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/cluster-management/upgrade.md
@@ -38,36 +38,6 @@ Doris 升级请遵守**不要跨两个及以上关键节点版本升级**的原
关键节点版本:升级时必须要经历的版本,可能是单独一个版本,也可能是一个版本区间,如 `1.1.3 -
1.1.5`,则表示升级至该区间任意一版本即可继续后续升级。
:::
-
-| 版本号 | 关键节点版本 | LTS 版本 |
-| ------------------------ | ------------ | -------- |
-| 0.12.x | 是 | 否 |
-| 0.13.x | 是 | 否 |
-| 0.14.x | 是 | 否 |
-| 0.15.x | 是 | 否 |
-| 1.0.0 - 1.1.2 | 否 | 否 |
-| 1.1.3 - 1.1.5 | 是 | 1.1-LTS |
-| 1.2.0 - 1.2.5 | 是 | 1.2-LTS |
-| 2.0.0-alpha - 2.0.0-beta | 是 | 2.0-LTS |
-
-示例:
-
-当前版本为 `0.12`,升级到 `2.0.0-beta` 版本的升级路线
-
-`0.12` -> `0.13` -> `0.14` -> `0.15` -> `1.1.3 - 1.1.5` 任意版本 -> `1.2.0 -
1.2.5` 任意版本 -> `2.0.0-beta`
-
-:::caution
-
-- LTS 版本:Long-time Support,LTS 版本提供长期支持,会持续维护六个月以上,通常而言,**版本号第三位数越大的版本,稳定性越好**。
-
-- Alpha 版本:内部测试版本,功能还未完全确定,或许存在重大 BUG,只推荐上测试集群做测试,**不推荐上生产集群!**
-
-- Beta 版本:公开测试版本,功能已基本确定,或许存在非重大 BUG,只推荐上测试集群做测试,**不推荐上生产集群!**
-
-- Release 版本:公开发行版,已完成基本重要 BUG 的修复和功能性缺陷修复验证,推荐上生产集群。
-
-:::
-
## 升级步骤
### 升级说明
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
index d641c5767038..a0cb8b48852b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
@@ -1,6 +1,6 @@
---
{
- "title": "Apache Ranger",
+ "title": "基于 Apache Ranger 的鉴权管理",
"language": "zh-CN"
}
---
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/certificate.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/certificate.md
index 685706655c05..ff662e5ac304 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/certificate.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/privilege-ldap/certificate.md
@@ -1,6 +1,6 @@
---
{
- "title": "SSL 密钥证书配置",
+ "title": "MySQL 安全传输",
"language": "zh-CN"
}
---
@@ -80,6 +80,6 @@ openssl pkcs12 -inkey ca-key.pem -in ca.pem -export -out
ca_certificate.p12
openssl pkcs12 -inkey server-key.pem -in server.pem -export -out
server_certificate.p12
```
-:::note
+:::info Note
[参考文档](https://www.ibm.com/docs/en/api-connect/2018.x?topic=overview-generating-self-signed-certificate-using-openssl)
:::
diff --git
a/versioned_docs/version-2.0/admin-manual/cluster-management/load-balancing.md
b/versioned_docs/version-2.0/admin-manual/cluster-management/load-balancing.md
index b655d57e0da3..161253307b62 100644
---
a/versioned_docs/version-2.0/admin-manual/cluster-management/load-balancing.md
+++
b/versioned_docs/version-2.0/admin-manual/cluster-management/load-balancing.md
@@ -1,6 +1,6 @@
---
{
- "title": "load balancing",
+ "title": "Load Balancing",
"language": "en"
}
@@ -25,7 +25,7 @@ specific language governing permissions and limitations
under the License.
-->
-# load balancing
+
When deploying multiple FE nodes, users can deploy a load balancing layer on
top of multiple FEs to achieve high availability of Doris.
@@ -83,7 +83,7 @@ tcp 0 0 0.0.0.0:6033 0.0.0.0:*
LISTEN
ProxySQL has a configuration file `/etc/proxysql.cnf` and a configuration
database file `/var/lib/proxysql/proxysql.db`. **Special attention is needed
here**: If there is a `"proxysql.db"` file (under the `/var/lib/proxysql`
directory), the ProxySQL service will only be read when it is started for the
first time The `proxysql.cnf file` and parse it; after startup, the
`proxysql.cnf` file will not be read! If you want the configuration in the
proxysql.cnf file to take effect after restart [...]
-#### View and modify configuration files
+**View and modify configuration files**
Here are mainly a few parameters, which have been commented out below, and you
can modify them according to your needs
@@ -138,7 +138,7 @@ mysql_replication_hostgroups=
)
```
-#### Connect to the ProxySQL management port test
+**Connect to the ProxySQL management port test**
```sql
# mysql -uadmin -padmin -P6032 -hdoris01
@@ -187,7 +187,7 @@ MySQL [main]> show tables;
20 rows in set (0.000 sec)
```
-#### ProxySQL configuration backend Doris FE
+**ProxySQL configuration backend Doris FE**
Use the insert statement to add the host to the mysql_servers table, where:
hostgroup_id is 10 for the write group, and 20 for the read group. We don't
need to read and write the license here, and it doesn't matter which one can be
set randomly.
@@ -261,7 +261,7 @@ MySQL [(none)]> save mysql servers to disk;
Query OK, 0 rows affected (0.348 sec)
```
-#### Monitor Doris FE node configuration
+**Monitor Doris FE node configuration**
After adding doris fe nodes, you also need to monitor these back-end nodes.
For multiple FE high-availability load balancing environments on the backend,
this is necessary because ProxySQL needs to be automatically adjusted by the
read_only value of each node
@@ -349,7 +349,7 @@ MySQL [(none)]> select
hostgroup_id,hostname,port,status,weight from mysql_serve
3 rows in set (0.000 sec)
```
-#### Configure Doris users
+**Configure Doris users**
All the above configurations are about the back-end Doris FE node. Now you can
configure the SQL statements, including: the user who sends the SQL statement,
the routing rules of the SQL statement, the cache of the SQL query, the
rewriting of the SQL statement, and so on.
@@ -426,7 +426,7 @@ Query OK, 0 rows affected (0.123 sec)
In this way, you can use the doris username and password to connect to
ProxySQL through the sql client
```
-#### Connect to Doris through ProxySQL for testing
+**Connect to Doris through ProxySQL for testing**
Next, use the root user and doris user to test whether they can be routed to
the default hostgroup_id=10 (it is a write group) to read data. The following
is connected through the forwarding port 6033, the connection is forwarded to
the real back-end database!
@@ -543,7 +543,7 @@ cd /usr/local/nginx
/usr/local/nginx/sbin/nginx -c conf.d/default.conf
```
-### verify
+### Verify
```
mysql -uroot -P6030 -h172.31.7.119
diff --git
a/versioned_docs/version-2.0/admin-manual/cluster-management/upgrade.md
b/versioned_docs/version-2.0/admin-manual/cluster-management/upgrade.md
index dc42b4c8ba63..584eaf3c9b8d 100644
--- a/versioned_docs/version-2.0/admin-manual/cluster-management/upgrade.md
+++ b/versioned_docs/version-2.0/admin-manual/cluster-management/upgrade.md
@@ -1,6 +1,6 @@
---
{
- "title": "Cluster upgrade",
+ "title": "Cluster Upgrade",
"language": "en"
}
---
@@ -24,7 +24,6 @@ specific language governing permissions and limitations
under the License.
-->
-# Cluster upgrade
## Overview
@@ -33,68 +32,45 @@ To upgrade, please use the steps recommended in this
chapter to upgrade the clus
## Doris Release Notes
:::tip
+When upgrading Doris, please follow the principle of **not skipping two minor
versions** and upgrade sequentially.
-For Doris upgrade, please follow the principle of **Do not upgrade across two
or more key node versions**. If you want to upgrade across multiple key node
versions, first upgrade to the nearest key node version, and then upgrade in
turn. If it is not critical node version, it can be ignored and skipped.
-
-Key node version: the version that must be experienced when upgrading, it may
be a single version, or a version range, such as `1.1.3 - 1.1.5`, it means that
you can continue to upgrade after upgrading to any version in this range .
-
+For example, if you are upgrading from version 0.15.x to 2.0.x, it is
recommended to first upgrade to the latest version of 1.1, then upgrade to the
latest version of 1.2, and finally upgrade to the latest version of 2.0.
:::
-| Version number | Key node version | LTS version |
-| ------------------------ | ------------ | -------- |
-| 0.12.x | Yes | No |
-| 0.13.x | Yes | No |
-| 0.14.x | Yes | No |
-| 0.15.x | Yes | No |
-| 1.0.0 - 1.1.2 | No | No |
-| 1.1.3 - 1.1.5 | Yes | 1.1-LTS |
-| 1.2.0 - 1.2.5 | Yes | 1.2-LTS |
-| 2.0.0-alpha - 2.0.0-beta | Yes | 2.0-LTS |
-
-Example:
-
-The current version is `0.12`, upgrade route to `2.0.0-beta` version
-
-`0.12` -> `0.13` -> `0.14` -> `0.15` -> `1.1.3 - 1.1.5` any version -> `1.2.0
- 1.2.5` any version -> `2.0.0 -beta`
-
-:::tip
-
-LTS version: Long-time Support, LTS version provides long-term support and
will be maintained for more than six months. Generally speaking, the version
with the larger third digit of the version number is more stable**.
-Alpha version: an internal test version, the function has not been fully
determined, and there may be major bugs. It is only recommended to use the test
cluster for testing, ** it is not recommended to use the production cluster! **
-
-Beta version: public test version, the function has been basically confirmed,
there may be non-major bugs, it is only recommended to use the test cluster for
testing, ** it is not recommended to use the production cluster! **
-
-Release version: a public release version, which has completed the repair of
basic important bugs and verification of functional defect fixes, and is
recommended for production clusters.
-
-:::
-
-## Upgrade steps
+## Upgrade Steps
### Upgrade Instructions
1. During the upgrade process, since Doris's RoutineLoad,
Flink-Doris-Connector, and Spark-Doris-Connector have implemented a retry
mechanism in the code, in a multi-BE node cluster, the rolling upgrade will not
cause the task to fail .
+
2. The StreamLoad task requires you to implement a retry mechanism in your own
code, otherwise the task will fail.
+
3. The cluster copy repair and balance function must be closed before and
opened after the completion of a single upgrade task, regardless of whether all
your cluster nodes have been upgraded.
-### Overview of the upgrade process
+### Overview of the Upgrade Process
1. Metadata backup
+
2. Turn off the cluster copy repair and balance function
+
3. Compatibility testing
+
4. Upgrade BE
+
5. Upgrade FE
+
6. Turn on the cluster replica repair and balance function
-### Upgrade pre-work
+### Upgrade Pre-work
Please perform the upgrade in sequence according to the upgrade process
-#### metadata backup (important)
+**01 Metadata Backup (Important)**
-** Make a full backup of the `doris-meta` directory of the FE-Master node! **
+**Make a full backup of the `doris-meta` directory of the FE-Master node!**
-#### Turn off the cluster replica repair and balance function
+**02 Turn off the cluster replica repair and balance function**
There will be node restart during the upgrade process, so unnecessary cluster
balancing and replica repair logic may be triggered, first close it with the
following command:
@@ -104,34 +80,32 @@ admin set frontend config("disable_colocate_balance" =
"true");
admin set frontend config("disable_tablet_scheduler" = "true");
```
-#### Compatibility testing
+**03 Compatibility Testing**
-:::tip
+:::caution Warning
-**Metadata compatibility is very important, if the upgrade fails due to
incompatible metadata, it may lead to data loss! It is recommended to perform a
metadata compatibility test before each upgrade! **
+**Metadata compatibility is very important, if the upgrade fails due to
incompatible metadata, it may lead to data loss! It is recommended to perform a
metadata compatibility test before each upgrade!**
:::
-##### FE Compatibility Test
-
-:::tip
-
-**important**
+1. FE Compatibility Test
+:::tip Important
1. It is recommended to do FE compatibility test on your local development
machine or BE node.
2. It is not recommended to test on Follower or Observer nodes to avoid link
exceptions
-3. If it must be on the Follower or Observer node, the started FE process
needs to be stopped
+3. If it must be on the Follower or Observer node, the started FE process
needs to be stopped
:::
-1. Use the new version alone to deploy a test FE process
+
+a. Use the new version alone to deploy a test FE process
```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
```
-2. Modify the FE configuration file fe.conf for testing
+b. Modify the FE configuration file fe.conf for testing
```shell
vi ${DORIS_NEW_HOME}/conf/fe.conf
@@ -149,9 +123,9 @@ admin set frontend config("disable_tablet_scheduler" =
"true");
...
```
- save and exit
+ Save and exit
-3. Modify fe.conf
+c. Modify fe.conf
- Add ClusterID configuration in fe.conf
@@ -164,50 +138,51 @@ admin set frontend config("disable_tablet_scheduler" =
"true");
echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf
```
-4. Copy the metadata directory doris-meta of the online environment Master FE
to the test environment
+d. Copy the metadata directory doris-meta of the online environment Master FE
to the test environment
```shell
cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta
```
-5. Change the cluster_id in the VERSION file copied to the test environment to
123456 (that is, the same as in step 3)
+e. Change the cluster_id in the VERSION file copied to the test environment to
123456 (that is, the same as in step 3)
```shell
vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION
clusterId=123456
```
-6. In the test environment, run the startup FE
+f. In the test environment, run the startup FE
- If the version is greater than or equal to 2.0.2, run the following command
```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon --metadata_failure_recovery
```
+
- If the version is less than 2.0.2, run the following command
```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
```
-7. Observe whether the startup is successful through the FE log fe.log
+g. Observe whether the startup is successful through the FE log fe.log
```shell
tail -f ${DORIS_NEW_HOME}/log/fe.log
```
-8. If the startup is successful, it means that there is no problem with the
compatibility, stop the FE process of the test environment, and prepare for the
upgrade
+h. If the startup is successful, it means that there is no problem with the
compatibility, stop the FE process of the test environment, and prepare for the
upgrade
```
sh ${DORIS_NEW_HOME}/bin/stop_fe.sh
```
-##### BE Compatibility Test
+2. BE Compatibility Test
You can use the grayscale upgrade scheme to upgrade a single BE first. If
there is no exception or error, the compatibility is considered normal, and
subsequent upgrade actions can be performed
### Upgrade process
-:::tip
+:::tip Tip
Upgrade BE first, then FE
@@ -223,10 +198,9 @@ However, when a major version is upgraded, new features
may be added or old func
#### Upgrade BE
-:::tip
+:::tip Tip
In order to ensure the safety of your data, please use 3 copies to store your
data to avoid data loss caused by misoperation or failure of the upgrade
-
:::
1. Under the premise of multiple copies, select a BE node to stop running and
perform grayscale upgrade
@@ -265,9 +239,9 @@ In order to ensure the safety of your data, please use 3
copies to store your da
6. Complete the upgrade of other BE nodes in sequence
-#### Upgrade FE
+**05 Upgrade FE**
-:::tip
+:::tip Tip
Upgrade the non-Master nodes first, and then upgrade the Master nodes.
@@ -309,7 +283,7 @@ Upgrade the non-Master nodes first, and then upgrade the
Master nodes.
6. Complete the upgrade of other FE nodes in turn, **finally complete the
upgrade of the Master node**
-#### Turn on the cluster replica repair and balance function
+**06 Turn on the cluster replica repair and balance function**
After the upgrade is complete and all BE nodes become `Alive`, enable the
cluster copy repair and balance function:
diff --git
a/versioned_docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
b/versioned_docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
index 911e9625b66b..747ed37b8e6a 100644
--- a/versioned_docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
+++ b/versioned_docs/version-2.0/admin-manual/privilege-ldap/apache-ranger.md
@@ -1,6 +1,6 @@
---
{
- "title": "Apache Ranger",
+ "title": "Authority Management based on Apache Ranger",
"language": "en"
}
---
diff --git
a/versioned_docs/version-2.0/admin-manual/privilege-ldap/certificate.md
b/versioned_docs/version-2.0/admin-manual/privilege-ldap/certificate.md
index eccbc2d293d0..07f792c96b1b 100644
--- a/versioned_docs/version-2.0/admin-manual/privilege-ldap/certificate.md
+++ b/versioned_docs/version-2.0/admin-manual/privilege-ldap/certificate.md
@@ -1,6 +1,6 @@
---
{
- "title": "TLS certificate",
+ "title": "MySQL Client Certificate",
"language": "en"
}
---
@@ -24,12 +24,13 @@ specific language governing permissions and limitations
under the License.
-->
-# Key Certificate Configuration
Enabling SSL functionality in Doris requires configuring both a CA key
certificate and a server-side key certificate. To enable mutual authentication,
a client-side key certificate must also be generated:
* The default CA key certificate file is located at
`Doris/fe/mysql_ssl_default_certificate/ca_certificate.p12`, with a default
password of `doris`. You can modify the FE configuration file `conf/fe.conf` to
add `mysql_ssl_default_ca_certificate = /path/to/your/certificate` to change
the CA key certificate file. You can also add
`mysql_ssl_default_ca_certificate_password = your_password` to specify the
password for your custom key certificate file.
+
* The default server-side key certificate file is located at
`Doris/fe/mysql_ssl_default_certificate/server_certificate.p12`, with a default
password of `doris`. You can modify the FE configuration file `conf/fe.conf` to
add `mysql_ssl_default_server_certificate = /path/to/your/certificate` to
change the server-side key certificate file. You can also add
`mysql_ssl_default_server_certificate_password = your_password` to specify the
password for your custom key certificate file.
+
* By default, a client-side key certificate is also generated and stored in
`Doris/fe/mysql_ssl_default_certificate/client-key.pem` and
`Doris/fe/mysql_ssl_default_certificate/client_certificate/`.
## Custom key certificate file
@@ -37,6 +38,7 @@ Enabling SSL functionality in Doris requires configuring both
a CA key certifica
In addition to the Doris default certificate file, you can also generate a
custom certificate file through `openssl`. Here are the steps (refer to
[Creating SSL Certificates and Keys Using
OpenSSL](https://dev.mysql.com/doc/refman/8.0/en/creating-ssl-files-using-openssl.html)):
1. Generate the CA, server-side, and client-side keys and certificates:
+
```bash
# Generate the CA certificate
openssl genrsa 2048 > ca-key.pem
@@ -61,11 +63,13 @@ openssl x509 -req -in client-req.pem -days 3600 \
```
2. Verify the created certificates:
+
```bash
openssl verify -CAfile ca.pem server-cert.pem client-cert.pem
```
3. Combine your key and certificate in a PKCS#12 (P12) bundle. You can also
specify a certificate format (PKCS12 by default). You can modify the
conf/fe.conf configuration file and add parameter ssl_trust_store_type to
specify the certificate format.
+
```bash
# Package the CA key and certificate
openssl pkcs12 -inkey ca-key.pem -in ca.pem -export -out ca_certificate.p12
@@ -74,4 +78,6 @@ openssl pkcs12 -inkey ca-key.pem -in ca.pem -export -out
ca_certificate.p12
openssl pkcs12 -inkey server-key.pem -in server.pem -export -out
server_certificate.p12
```
->[reference
documents](https://www.ibm.com/docs/en/api-connect/2018.x?topic=overview-generating-self-signed-certificate-using-openssl)
+:::info Note
+[reference
documents](https://www.ibm.com/docs/en/api-connect/2018.x?topic=overview-generating-self-signed-certificate-using-openssl)
+:::
\ No newline at end of file
diff --git
a/versioned_docs/version-2.0/admin-manual/privilege-ldap/fe-certificate.md
b/versioned_docs/version-2.0/admin-manual/privilege-ldap/fe-certificate.md
index 2a471ab39ddd..8293743f2cb2 100644
--- a/versioned_docs/version-2.0/admin-manual/privilege-ldap/fe-certificate.md
+++ b/versioned_docs/version-2.0/admin-manual/privilege-ldap/fe-certificate.md
@@ -1,6 +1,6 @@
---
{
- "title": "FE SSL certificate",
+ "title": "FE SSL Certificate",
"language": "en"
}
---
diff --git
a/versioned_docs/version-2.0/admin-manual/resource-admin/multi-tenant.md
b/versioned_docs/version-2.0/admin-manual/resource-admin/multi-tenant.md
index a5fb83a6c726..09209abfd189 100644
--- a/versioned_docs/version-2.0/admin-manual/resource-admin/multi-tenant.md
+++ b/versioned_docs/version-2.0/admin-manual/resource-admin/multi-tenant.md
@@ -1,6 +1,6 @@
---
{
- "title": "Multi-tenancy",
+ "title": "Resource Group",
"language": "en"
}
---
@@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->
-# Multi-tenancy
+
The main purpose of Doris's multi-tenant and resource isolation solution is to
reduce interference between multiple users when performing data operations in
the same Doris cluster, and to allocate cluster resources to each user more
reasonably.
@@ -138,7 +138,10 @@ Node resource division refers to setting tags for BE nodes
in a Doris cluster, a
After the setting is complete, when user1 initiates a query on the
UserTable table, it will only access the data copy on the nodes in the
`group_a` resource group, and the query will only use the node computing
resources in the `group_a` resource group. The query of user3 can use copies
and computing resources in any resource group.
- > Note: By default, the user's `resource_tags.location` attribute is
empty. In versions prior to 2.0.2 (inclusive), by default, users are not
restricted by tags and can use any resource group. After version 2.0.3, normal
users can only use the `default` resource group by default. Root and Admin user
can use any resource group.
+ :::tip Tip
+
+ By default, the user's `resource_tags.location` attribute is empty. In
versions prior to 2.0.2 (inclusive), by default, users are not restricted by
tags and can use any resource group. After version 2.0.3, normal users can only
use the `default` resource group by default. Root and Admin user can use any
resource group.
+ :::
In this way, we have achieved physical resource isolation for different
user queries by dividing nodes and restricting user resource usage.
Furthermore, we can create different users for different business departments
and restrict each user from using different resource groups. In order to avoid
the use of resource interference between different business parts. For example,
there is a business table in the cluster that needs to be shared by all 9
business departments, but it is hoped [...]
@@ -147,11 +150,15 @@ Node resource division refers to setting tags for BE
nodes in a Doris cluster, a
4. Resource group assignments for load job
The resource usage of load jobs (including insert, broker load, routine
load, stream load, etc.) can be divided into two parts:
+
1. Computing resources: responsible for reading data sources, data
transformation and distribution.
+
2. Write resource: responsible for data encoding, compression and writing
to disk.
The write resource must be the node where the replica is located, and the
computing resource can theoretically select any node to complete. Therefore,
the allocation of resource groups for load jobs is divided into two steps:
+
1. Use user-level resource tags to limit the resource groups that
computing resources can use.
+
2. Use the resource tag of the replica to limit the resource group that
the write resource can use.
So if you want all the resources used by the load operation to be limited
to the resource group where the data is located, you only need to set the
resource tag of the user level to the same as the resource tag of the replica.
@@ -200,9 +207,13 @@ Through memory and CPU resource limits. We can divide user
queries into more fin
### Tag division and CPU limitation are new features in version 0.15. In order
to ensure a smooth upgrade from the old version, Doris has made the following
forward compatibility:
1. Each BE node will have a default Tag: `"tag.location": "default"`.
+
2. The BE node added through the `alter system add backend` statement will
also set Tag: `"tag.location": "default"` by default.
+
2. The copy distribution of all tables is modified by default to:
`"tag.location.default:xx`. xx is the number of original copies.
+
3. Users can still specify the number of replicas in the table creation
statement by `"replication_num" = "xx"`, this attribute will be automatically
converted to: `"tag.location.default:xx`. This ensures that there is no need to
modify the original creation. Table statement.
+
4. By default, the memory limit for a single query is 2GB for a single node,
and the CPU resources are unlimited, which is consistent with the original
behavior. And the user's `resource_tags.location` attribute is empty, that is,
by default, the user can access the BE of any Tag, which is consistent with the
original behavior.
Here we give an example of the steps to start using the resource division
function after upgrading from the original cluster to version 0.15:
@@ -245,27 +256,27 @@ Through the above 4 steps, we can smoothly use the
resource division function af
### How to conveniently set replica distribution strategies when there are
many tables
- For example, there is a db1 with four tables under it, and the replica
distribution strategy required for table1 is `group_a:1,group_b:2`, the replica
distribution strategy required for tables 2, 3, and 4 is `group_c:1,group_b:2`
+For example, there is a db1 with four tables under it, and the replica
distribution strategy required for table1 is `group_a:1,group_b:2`, the replica
distribution strategy required for tables 2, 3, and 4 is `group_c:1,group_b:2`
- Then you can use the following statement to create db1:
-
- ```sql
- CREATE DATABASE db1 PROPERTIES (
- "replication_allocation" = "tag.location.group_c:1, tag.location.group_b:2"
- )
- ```
+Then you can use the following statement to create db1:
- Create table1 using the following statement:
-
- ```sql
- CREATE TABLE table1
- (k1 int, k2 int)
- distributed by hash(k1) buckets 1
- properties(
- "replication_allocation"="tag.location.group_a:1, tag.location.group_b:2"
- )
- ```
+```sql
+CREATE DATABASE db1 PROPERTIES (
+"replication_allocation" = "tag.location.group_c:1, tag.location.group_b:2"
+)
+```
+
+Create table1 using the following statement:
+
+```sql
+CREATE TABLE table1
+(k1 int, k2 int)
+distributed by hash(k1) buckets 1
+properties(
+"replication_allocation"="tag.location.group_a:1, tag.location.group_b:2"
+)
+```
- The table creation statements for table2, table3, and table4 do not need to
specify `replication_allocation` again.
+The table creation statements for table2, table3, and table4 do not need to
specify `replication_allocation` again.
- Note: Changing the replica distribution policy of the database will not
affect existing tables.
+Note: Changing the replica distribution policy of the database will not affect
existing tables.
diff --git
a/versioned_docs/version-2.0/data-operate/delete/batch-delete-manual.md
b/versioned_docs/version-2.0/data-operate/delete/batch-delete-manual.md
index 659247edae71..1cb7328b0377 100644
--- a/versioned_docs/version-2.0/data-operate/delete/batch-delete-manual.md
+++ b/versioned_docs/version-2.0/data-operate/delete/batch-delete-manual.md
@@ -24,58 +24,71 @@ specific language governing permissions and limitations
under the License.
-->
-# Batch Delete
-Currently, Doris supports multiple import methods such as [broker
load](../import/import-way/broker-load-manual), [routine
load](../import/import-way/routine-load-manual.md), [stream
load](../import/import-way/stream-load-manual), etc. The data can only be
deleted through the delete statement at present. When the delete statement is
used to delete, a new data version will be generated every time delete is
executed. Frequent deletion will seriously affect the query performance, and
when u [...]
+Why do we need to introduce import-based Batch Delete when we have the Delete
operation?
+
+- **Limitations of Delete operation**
+
+When you delete by Delete statement, each execution of Delete generates an
empty rowset to record the deletion conditions and a new version of the data.
Each time you read, you have to filter the deletion conditions. If you delete
too often or have too many deletion conditions, it will seriously affect the
query performance.
+
+- **Insert data interspersed with Delete data**
+
+For scenarios like importing data from a transactional database via CDC,
Insert and Delete are usually interspersed in the data. In this case, the
current Delete operation cannot be implemented.
+
+Based on imported data, there are three ways the data can be merged:
+
+1. APPEND: Append all data to existing data.
+
+2. DELETE: Delete all rows that have the same value as the key column of the
imported data (when a `sequence` column exists in the table, it is necessary to
satisfy the logic of having the same primary key as well as the size of the
sequence column in order to delete it correctly, see Use Case 4 below for
details).
-For scenarios similar to the import of cdc data, insert and delete in the data
data generally appear interspersed. In this scenario, our current import method
is not enough, even if we can separate insert and delete, it can solve the
import problem , But still cannot solve the problem of deletion. Use the batch
delete function to solve the needs of these scenarios.
-There are three ways to merge data import:
-1. APPEND: All data are appended to existing data
-2. DELETE: delete all rows with the same key column value as the imported
data(When there is a
[`sequence`](../data-operate/update-delete/sequence-column-manual) column in
the table, the same primary key and the logic of the size of the sequence
column must be satisfied at the same time to delete it correctly, see use case
4 below for details.)
3. MERGE: APPEND or DELETE according to DELETE ON decision
+:::caution Warning
+Batch Delete only works on Unique models.
+:::
+
## Fundamental
-This is achieved by adding a hidden column `__DORIS_DELETE_SIGN__`, because we
are only doing batch deletion on the unique model, so we only need to add a
hidden column whose type is bool and the aggregate function is replace. In be,
the various aggregation write processes are the same as normal columns, and
there are two read schemes:
-Remove `__DORIS_DELETE_SIGN__` when fe encounters extensions such as *, and
add the condition of `__DORIS_DELETE_SIGN__ != true` by default When be reads,
a column is added for judgment, and the condition is used to determine whether
to delete.
+This is achieved by adding a hidden column `DORIS_DELETE_SIGN` to the Unique
table.
+
+When FE parses the query, `DORIS_DELETE_SIGN` is removed when it encounters *
and so on, and `DORIS_DELETE_SIGN !` `= true`, BE will add a column for
judgement when reading, and determine whether to delete by the condition.
-### Import
+- Import
-When importing, set the value of the hidden column to the value of the `DELETE
ON` expression during fe parsing. The other aggregation behaviors are the same
as the replace aggregation column.
+ On import, the value of the hidden column is set to the value of the
`DELETE ON` expression during the FE parsing stage.
-### Read
+- Read
-When reading, add the condition of `__DORIS_DELETE_SIGN__ != true` to all
olapScanNodes with hidden columns, be does not perceive this process and
executes normally.
+ The read adds `DORIS_DELETE_SIGN !` `= true` condition, BE does not sense
this process and executes normally.
-### Cumulative Compaction
+- Cumulative Compaction
-In Cumulative Compaction, hidden columns are treated as normal columns, and
the compaction logic remains unchanged.
+ In Cumulative Compaction, hidden columns are treated as normal columns and
the Compaction logic remains unchanged.
-### Base Compaction
+- Base Compaction
-In Base Compaction, delete the rows marked for deletion to reduce the space
occupied by data.
+ When Base Compaction is performed, the rows marked for deletion are
deleted to reduce the space occupied by the data.
-## Enable Bulk Delete Support
+## Enable Batch Delete Support
-There are two ways of enabling batch delete support:
+There are two forms of enabling Batch Delete support:
-1. By adding `enable_batch_delete_by_default=true` in the fe configuration
file, all newly created tables after restarting fe support batch deletion, this
option defaults to true
+1. Batch Delete is supported by adding `enable_batch_delete_by_default=true`
in the FE configuration file for all new tables created after restarting FE;
-2. For tables that have not changed the above fe configuration or for existing
tables that do not support the bulk delete function, you can use the following
statement:
- `ALTER TABLE tablename ENABLE FEATURE "BATCH_DELETE"` to enable the batch
delete.
+2. For tables that do not have the above FE configuration changed or for
existing tables that do not support Batch Delete, the following statement can
be used: `ALTER TABLE tablename ENABLE FEATURE "BATCH_DELETE"` to enable Batch
Delete. This is essentially a schema change operation, which returns
immediately and can be confirmed by `showing alter table column`.
-If you want to determine whether a table supports batch delete, you can set a
session variable to display the hidden columns `SET show_hidden_columns=true`,
and then use `desc tablename`, if there is a `__DORIS_DELETE_SIGN__` column in
the output, it is supported, if not, it is not supported
+Then how to determine whether a table supports Batch Delete, you can set a
session variable to show hidden columns `SET show_hidden_columns=true`, and
after that use `desc tablename`, if there is a `DORIS_DELETE_SIGN` column in
the output then it is supported, if there is not then it is not supported.
+
+## Syntax Description
-### Syntax Description
The syntax design of the import is mainly to add a column mapping that
specifies the field of the delete marker column, and it is necessary to add a
column to the imported data. The syntax of various import methods is as follows:
-#### Stream Load
+### Stream Load
-The writing method of `Stream Load` adds a field to set the delete label
column in the columns field in the header. Example
-`-H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H
"delete: label_c3=1"`
+The writing method of `Stream Load` adds a field to set the delete label
column in the columns field in the header. Example: `-H "columns: k1, k2,
label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"`
-#### Broker Load
+### Broker Load
-The writing method of `Broker Load` sets the field of the delete marker column
at `PROPERTIES`, the syntax is as follows:
+The writing method of `Broker Load` sets the field of the delete marker column
at `PROPERTIES`. The syntax is as follows:
```sql
LOAD LABEL db1.label1
@@ -91,7 +104,7 @@ LOAD LABEL db1.label1
)
[DELETE ON label_c3=true]
)
-WITH BROKER'broker'
+WITH BROKER 'broker'
(
"username"="user",
"password"="pass"
@@ -100,57 +113,61 @@ PROPERTIES
(
"timeout" = "3600"
);
-
```
-#### Routine Load
+### Routine Load
The writing method of `Routine Load` adds a mapping to the `columns` field.
The mapping method is the same as above. The syntax is as follows:
```sql
- CREATE ROUTINE LOAD example_db.test1 ON example_tbl
- [WITH MERGE|APPEND|DELETE]
- COLUMNS(k1, k2, k3, v1, v2, label),
- WHERE k1> 100 and k2 like "%doris%"
- [DELETE ON label=true]
- PROPERTIES
- (
- "desired_concurrent_number"="3",
- "max_batch_interval" = "20",
- "max_batch_rows" = "300000",
- "max_batch_size" = "209715200",
- "strict_mode" = "false"
- )
- FROM KAFKA
- (
- "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
- "kafka_topic" = "my_topic",
- "kafka_partitions" = "0,1,2,3",
- "kafka_offsets" = "101,0,0,200"
- );
+CREATE ROUTINE LOAD example_db.test1 ON example_tbl
+ [WITH MERGE|APPEND|DELETE]
+ COLUMNS(k1, k2, k3, v1, v2, label),
+ WHERE k1 100 and k2 like "%doris%"
+ [DELETE ON label=true]
+ PROPERTIES
+ (
+ "desired_concurrent_number"="3",
+ "max_batch_interval" = "20",
+ "max_batch_rows" = "300000",
+ "max_batch_size" = "209715200",
+ "strict_mode" = "false"
+ )
+ FROM KAFKA
+ (
+ "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
+ "kafka_topic" = "my_topic",
+ "kafka_partitions" = "0,1,2,3",
+ "kafka_offsets" = "101,0,0,200"
+ );
```
## Note
-1. Since import operations other than stream load may be executed out of order
inside doris, if it is not stream load when importing using the `MERGE` method,
it needs to be used with load sequence. For the specific syntax, please refer
to the [sequence](sequence-column-manual) column related documents
+
+1. Since import operations other than stream load may be executed out of order
inside doris, if it is not stream load when importing using the `MERGE` method,
it needs to be used with load sequence. For the specific syntax, please refer
to the `sequence` column related documents
+
2. `DELETE ON` condition can only be used with MERGE.
-3. if session variable `SET show_hidden_columns = true` was executed before
running import task to show whether table support batch delete feature, then
execute `select count(*) from xxx` statement in the same session after
finishing `DELETE/MERGE` import task, it will result in a unexpected result
that the statement result set will include the deleted results. To avoid this
problem you should execute `SET show_hidden_columns = false` before select
statement or open a new session to run [...]
+
+:::tip Tip
+if session variable `SET show_hidden_columns = true` was executed before
running import task to show whether table support batch delete feature, then
execute `select count(*) from xxx` statement in the same session after
finishing `DELETE/MERGE` import task, it will result in a unexpected result
that the statement result set will include the deleted results. To avoid this
problem, you should execute `SET show_hidden_columns = false` before selecting
statement or open a new session to run [...]
+:::
## Usage Examples
-### Check if Bulk Delete Support is Enabled
+### Check if Batch Delete Support is Enabled
```sql
-mysql> SET show_hidden_columns=true;
+mysql SET show_hidden_columns=true;
Query OK, 0 rows affected (0.00 sec)
-mysql> DESC test;
+mysql DESC test;
+-----------------------+--------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-------+---------+---------+
| name | VARCHAR(100) | No | true | NULL | |
| gender | VARCHAR(10) | Yes | false | NULL | REPLACE |
| age | INT | Yes | false | NULL | REPLACE |
-| __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |
+| DORIS_DELETE_SIGN | TINYINT | No | false | 0 | REPLACE |
+-----------------------+--------------+------+-------+---------+---------+
4 rows in set (0.00 sec)
```
@@ -158,150 +175,179 @@ mysql> DESC test;
### Stream Load Usage Examples
1. Import data normally:
-```shell
-curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid,
citycode, username, pv" -H "merge_type: APPEND" -T ~/table1_data
http://127.0.0.1: 8130/api/test/table1/_stream_load
-```
-The APPEND condition can be omitted, which has the same effect as the
following statement:
-```shell
-curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid,
citycode, username, pv" -T ~/table1_data http://127.0.0.1:8130/api/test/table1
/_stream_load
-```
-2. Delete all data with the same key as the imported data
-```shell
-curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid,
citycode, username, pv" -H "merge_type: DELETE" -T ~/table1_data
http://127.0.0.1: 8130/api/test/table1/_stream_load
-```
-Before load:
-```sql
-+--------+----------+----------+------+
-| siteid | citycode | username | pv |
-+--------+----------+----------+------+
-| 3 | 2 | tom | 2 |
-| 4 | 3 | bush | 3 |
-| 5 | 3 | helen | 3 |
-+--------+----------+----------+------+
-```
-Load data:
-```
-3,2,tom,0
-```
-After load:
-```sql
-+--------+----------+----------+------+
-| siteid | citycode | username | pv |
-+--------+----------+----------+------+
-| 4 | 3 | bush | 3 |
-| 5 | 3 | helen | 3 |
-+--------+----------+----------+------+
-```
-3. Import the same row as the key column of the row with `site_id=1`
-```shell
-curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid,
citycode, username, pv" -H "merge_type: MERGE" -H "delete: siteid=1" -T ~/
table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
-```
-Before load:
-```sql
-+--------+----------+----------+------+
-| siteid | citycode | username | pv |
-+--------+----------+----------+------+
-| 4 | 3 | bush | 3 |
-| 5 | 3 | helen | 3 |
-| 1 | 1 | jim | 2 |
-+--------+----------+----------+------+
-```
-Load data:
-```
-2,1,grace,2
-3,2,tom,2
-1,1,jim,2
-```
-After load:
-```sql
-+--------+----------+----------+------+
-| siteid | citycode | username | pv |
-+--------+----------+----------+------+
-| 4 | 3 | bush | 3 |
-| 2 | 1 | grace | 2 |
-| 3 | 2 | tom | 2 |
-| 5 | 3 | helen | 3 |
-+--------+----------+----------+------+
-```
-4. When the table has the sequence column, delete all data with the same key
as the imported data
+ ```shell
+ curl --location-trusted -u root: -H "column_separator:," -H "columns:
siteid, citycode, username, pv" -H "merge_type: APPEND" -T ~/table1_data
http://127.0.0.1: 8130/api/test/table1/_stream_load
+ ```
-```bash
-curl --location-trusted -u root: -H "column_separator:," -H "columns: name,
gender, age" -H "function_column.sequence_col: age" -H "merge_type: DELETE" -T
~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
-```
+ The APPEND condition can be omitted, which has the same effect as the
following statement:
-When the unique table has the sequence column, sequence column is used as the
basis for the replacement order of the REPLACE aggregate function under the
same key column, and the larger value can replace the smaller value.
-If you want delete some data, the imported data must have the same key and the
sequence column must be larger or equal than before.
+ ```shell
+ curl --location-trusted -u root: -H "column_separator:," -H "columns:
siteid, citycode, username, pv" -T ~/table1_data
http://127.0.0.1:8130/api/test/table1 /_stream_load
+ ```
-for example, one table like this:
-```sql
-mysql> SET show_hidden_columns=true;
-Query OK, 0 rows affected (0.00 sec)
+2. Delete all data with the same key as the imported data
-mysql> DESC table1;
-+------------------------+--------------+------+-------+---------+---------+
-| Field | Type | Null | Key | Default | Extra |
-+------------------------+--------------+------+-------+---------+---------+
-| name | VARCHAR(100) | No | true | NULL | |
-| gender | VARCHAR(10) | Yes | false | NULL | REPLACE |
-| age | INT | Yes | false | NULL | REPLACE |
-| __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |
-| __DORIS_SEQUENCE_COL__ | INT | Yes | false | NULL | REPLACE |
-+------------------------+--------------+------+-------+---------+---------+
-4 rows in set (0.00 sec)
-```
+ ```Shell
+ curl --location-trusted -u root: -H "column_separator:," -H "columns:
siteid, citycode, username, pv" -H "merge_type: DELETE" -T ~/table1_data
http://127.0.0.1: 8130/api/test/table1/_stream_load
+ ```
-Before load:
-```text
-+-------+--------+------+
-| name | gender | age |
-+-------+--------+------+
-| li | male | 10 |
-| wang | male | 14 |
-| zhang | male | 12 |
-+-------+--------+------+
-```
+ Before load:
-If you load data like this:
-```text
-li,male,10
-```
+ ```sql
+ +--------+----------+----------+------+
+ | siteid | citycode | username | pv |
+ +--------+----------+----------+------+
+ | 3 | 2 | tom | 2 |
+ | 4 | 3 | bush | 3 |
+ | 5 | 3 | helen | 3 |
+ +--------+----------+----------+------+
+ ```
-After load:
-```text
-+-------+--------+------+
-| name | gender | age |
-+-------+--------+------+
-| wang | male | 14 |
-| zhang | male | 12 |
-+-------+--------+------+
-```
+ Load data:
-You will find that the data is deleted.
-```text
-li,male,10
-```
+ ```Plain
+ 3,2,tom,0
+ ```
-But if you load data like this:
-```text
-li,male,9
-```
+ After load:
-After load:
-```text
-+-------+--------+------+
-| name | gender | age |
-+-------+--------+------+
-| li | male | 10 |
-| wang | male | 14 |
-| zhang | male | 12 |
-+-------+--------+------+
-```
+ ```sql
+ +--------+----------+----------+------+
+ | siteid | citycode | username | pv |
+ +--------+----------+----------+------+
+ | 4 | 3 | bush | 3 |
+ | 5 | 3 | helen | 3 |
+ +--------+----------+----------+------+
+ ```
-You will find that the data is not deleted.
-```text
-li,male,10
-```
-This is because in the underlying dependencies, it will first judge the case
of the same key, display the row data with a large value in the sequence
column, and then check whether the `__DORIS_DELETE_SIGN__` value of the row is
1. If it is 1, it will not be displayed. If it is 0, it will still be read out.
+3. Import the same row as the key column of the row with `site_id=1`
+
+ ```shell
+ curl --location-trusted -u root: -H "column_separator:," -H "columns:
siteid, citycode, username, pv" -H "merge_type: MERGE" -H "delete: siteid=1" -T
~/ table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+ ```
+
+ Before load:
+
+ ```sql
+ +--------+----------+----------+------+
+ | siteid | citycode | username | pv |
+ +--------+----------+----------+------+
+ | 4 | 3 | bush | 3 |
+ | 5 | 3 | helen | 3 |
+ | 1 | 1 | jim | 2 |
+ +--------+----------+----------+------+
+ ```
+
+ Load data:
+
+ ```Plain
+ 2,1,grace,2
+ 3,2,tom,2
+ 1,1,jim,2
+ ```
+
+ After load:
+
+ ```sql
+ +--------+----------+----------+------+
+ | siteid | citycode | username | pv |
+ +--------+----------+----------+------+
+ | 4 | 3 | bush | 3 |
+ | 2 | 1 | grace | 2 |
+ | 3 | 2 | tom | 2 |
+ | 5 | 3 | helen | 3 |
+ +--------+----------+----------+------+
+ ```
+
+4. When the table has the sequence column, delete all data with the same key
as the imported data
-**When data is written and deleted at the same time in the imported data (such
as in the Flink CDC scenario), using the sequence column can effectively ensure
the consistency when the data arrives out of order, avoiding the deletion
operation of an old version that arrives later, and accidentally deleting The
new version of the data that arrives first.**
\ No newline at end of file
+ ```bash
+ curl --location-trusted -u root: -H "column_separator:," -H "columns:
name, gender, age" -H "function_column.sequence_col: age" -H "merge_type:
DELETE" -T ~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+ ```
+
+ When the unique table has the sequence column, sequence column is used as
the basis for the replacement order of the REPLACE aggregate function under the
same key column, and the larger value can replace the smaller value. If you
want delete some data, the imported data must have the same key and the
sequence column must be larger or equal than before.
+
+ For example, one table like this:
+
+ ```sql
+ mysql SET show_hidden_columns=true;
+ Query OK, 0 rows affected (0.00 sec)
+
+ mysql DESC table1;
+
+------------------------+--------------+------+-------+---------+---------+
+ | Field | Type | Null | Key | Default | Extra
|
+
+------------------------+--------------+------+-------+---------+---------+
+ | name | VARCHAR(100) | No | true | NULL |
|
+ | gender | VARCHAR(10) | Yes | false | NULL | REPLACE
|
+ | age | INT | Yes | false | NULL | REPLACE
|
+ | DORIS_DELETE_SIGN | TINYINT | No | false | 0 | REPLACE |
+ | DORIS_SEQUENCE_COL | INT | Yes | false | NULL | REPLACE |
+
+------------------------+--------------+------+-------+---------+---------+
+ 4 rows in set (0.00 sec)
+ ```
+
+ Before load:
+
+ ```sql
+ +-------+--------+------+
+ | name | gender | age |
+ +-------+--------+------+
+ | li | male | 10 |
+ | wang | male | 14 |
+ | zhang | male | 12 |
+ +-------+--------+------+
+ ```
+
+ If you load data like this:
+
+ ```Plain
+ li,male,10
+ ```
+
+ After load:
+
+ ```sql
+ +-------+--------+------+
+ | name | gender | age |
+ +-------+--------+------+
+ | wang | male | 14 |
+ | zhang | male | 12 |
+ +-------+--------+------+
+ ```
+
+ You will find that the data is deleted.
+
+ ```Plain
+ li,male,10
+ ```
+
+ But if you load data like this:
+
+ ```Plain
+ li,male,9
+ ```
+
+ After load:
+
+ ```sql
+ +-------+--------+------+
+ | name | gender | age |
+ +-------+--------+------+
+ | li | male | 10 |
+ | wang | male | 14 |
+ | zhang | male | 12 |
+ +-------+--------+------+
+ ```
+
+ You will find that the data is not deleted.
+
+ ```Plain
+ li,male,10
+ ```
+
+ This is because in the underlying dependencies, it will first judge the
case of the same key, display the row data with a large value in the sequence
column, and then check whether the `DORIS_DELETE_SIGN` value of the row is 1.
If it is 1, it will not be displayed. If it is 0, it will still be read out.
+
+:::tip Tip
+When data is written and deleted at the same time in the imported data (e.g.,
in the Flink CDC scenario), using the sequence column can effectively ensure
consistency when the data arrives out of order, avoiding the deletion operation
of an old version that arrives later, and accidentally deleting the new version
of the data that arrives first.
+:::
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/data-operate/delete/delete-manual.md
b/versioned_docs/version-2.0/data-operate/delete/delete-manual.md
index 29dd7c3f446b..9bd5dabdf3fb 100644
--- a/versioned_docs/version-2.0/data-operate/delete/delete-manual.md
+++ b/versioned_docs/version-2.0/data-operate/delete/delete-manual.md
@@ -25,124 +25,273 @@ specific language governing permissions and limitations
under the License.
-->
-# Delete
-Delete is different from other import methods. It is a synchronization
process, similar to Insert into. All Delete operations are an independent
import job in Doris. Generally, the Delete statement needs to specify the table
and partition and delete conditions to filter the data to be deleted. , and
will delete the data of the base table and the rollup table at the same time.
+The DELETE statement conditionally deletes data from a specified table or
partition using the MySQL protocol.The Delete operation differs from
import-based bulk deletion in that it is similar to the INSERT INTO statement,
which is a synchronous process.All Delete operations are a separate import job
in Doris.
+The DELETE statement generally requires the specification of tables and
partitions as well as deletion conditions to filter the data to be deleted, and
will delete data from both the base and rollup tables.
-## Syntax
+The syntax of the DELETE statement is detailed in the
[DELETE](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE)
syntax. Unlike the Insert into command, Delete cannot specify `label`
manually. For the concept of `label` , refer to the [Insert
Into](../../data-operate/import/import-way/insert-into-manual) documentation.
-Please refer to the official website for the
[DELETE](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE.md)
syntax of the delete operation.
+### Delete by Specifying a Filter Predicate
-## Delete Result
+```sql
+DELETE FROM table_name [table_alias]
+ [PARTITION partition_name | PARTITIONS (partition_name [, partition_name])]
+ WHERE column_name op { value | value_list } [ AND column_name op { value |
value_list } ...];
+```
+
+### Required Parameters
+
+- table_name: Specify the table from which the data should be deleted;
+
+- column_name: Columns belonging to table_name
+
+- op: Logical comparison operators, optional types include: =, >, <, >=, <=,
!=, in, not in
+
+- value | value_list: Values or lists of values for logical comparisons
+
+
+### Optional Parameters
-The delete command is an SQL command, and the returned results are
synchronous. It can be divided into the following types:
+- PARTITION partition_name | PARTITIONS (partition_name [, partition_name]):
Specify the name of the partition in which the deletion is to be performed. If
the partition does not exist in the table, an error will be reported.
-1. Successful visible
+- table_alias: Aliases of the Table
- If delete completes successfully and is visible, the following results will
be returned, `query OK` indicates success.
+### Note
+
+- When using the table model AGGREGATE, UNIQUE, you can only specify
conditions on the key column.
+
+- If the selected key column does not exist in a rollup, it cannot be deleted.
+
+- Conditions can only be related to each other by "and". If you want an "or"
relationship, you need to write the conditions in two separate DELETE
statements;
+
+- If the table is partitioned, you need to specify the partition. If not,
doris will infer the partition from the condition.In two cases, doris cannot
infer the partition from the condition:
- ```sql
- mysql> delete from test_tbl PARTITION p1 where k1 = 1;
- Query OK, 0 rows affected (0.04 sec)
- {'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251',
'status':'VISIBLE', 'txnId':'4005'}
- ```
+ - The condition does not contain a partition column
-2. Submitted successfully, but not visible
+ - The op for the partition column is "not in". When the partition table does
not specify a partition, or a partition cannot be inferred from the condition,
you need to set the session variable `delete_without_partition` to true, in
which case delete is applied to all partitions.
- The transaction submission of Doris is divided into two steps: submission
and publish version. Only after the publish version step is completed, the
result will be visible to the user. If it has been submitted successfully, then
it can be considered that the publish version step will eventually success.
Doris will try to wait for publishing for a period of time after submitting. If
it has timed out, even if the publishing version has not been completed, it
will return to the user in p [...]
+- This statement may reduce query efficiency for a period of time after
execution. The extent of the impact depends on the number of deleted conditions
specified in the statement. The more conditions specified, the greater the
impact.
+
+### Examples
+
+**1. Delete the row in my_table partition p1 where column k1 is 3.**
+
+```sql
+DELETE FROM my_table PARTITION p1
+ WHERE k1 = 3;
+```
- ```sql
- mysql> delete from test_tbl PARTITION p1 where k1 = 1;
- Query OK, 0 rows affected (0.04 sec)
- {'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251',
'status':'COMMITTED', 'txnId':'4005', 'err':'delete job is committed but may be
taking effect later' }
- ```
+**2. Delete rows in my_table partition p1 where column k1 is greater than or
equal to 3 and column k2 is "abc".**
- The result will return a JSON string at the same time:
+```sql
+DELETE FROM my_table PARTITION p1
+WHERE k1 = 3 AND k2 = "abc";
+```
- `affected rows`: Indicates the row affected by this deletion. Since the
deletion of Doris is currently a logical deletion, the value is always 0.
+**3. Delete rows in my_table partition (p1, p2) where column k1 is greater
than or equal to 3 and column k2 is "abc".**
- `label`: The label generated automatically to be the signature of the
delete jobs. Each job has a unique label within a single database.
+```sql
+DELETE FROM my_table PARTITIONS (p1, p2)
+WHERE k1 = 3 AND k2 = "abc";
+```
- `status`: Indicates whether the data deletion is visible. If it is visible,
`visible` will be displayed. If it is not visible, `committed` will be
displayed.
+## Delete via the USING clause
- `txnId`: The transaction ID corresponding to the delete job
+```sql
+DELETE FROM table_name [table_alias]
+ [PARTITION partition_name | PARTITIONS (partition_name [, partition_name])]
+ [USING additional_tables]
+ WHERE condition
+```
- `err`: Field will display some details of this deletion
+### Required parameters
-3. Commit failed, transaction cancelled
+- table_name: Specify the table from which the data should be deleted;
- If the delete statement is not submitted successfully, it will be
automatically aborted by Doris and the following results will be returned
+- WHERE condition: Specify a condition for selecting rows for deletion;
+### Optional parameters
- ```sql
- mysql> delete from test_tbl partition p1 where k1 > 80;
- ERROR 1064 (HY000): errCode = 2, detailMessage = {错误原因}
- ```
+- PARTITION partition_name | PARTITIONS (partition_name [, partition_name]):
Specify the name of the partition in which the deletion is to be performed. If
the partition does not exist in the table, an error will be reported.
- example:
+- table_alias: Aliases of the Table
- A timeout deletion will return the timeout and unfinished replicas
displayed as ` (tablet = replica)`
+### Note
+- Only conditions on the key column can be specified when using the UNIQUE
model.
- ```sql
- mysql> delete from test_tbl partition p1 where k1 > 80;
- ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas
from job: 4005, Unfinished replicas:10000=60000, 10001=60000, 10002=60000
- ```
+### Example
- **The correct processing logic for the returned results of the delete
operation is as follows:**
+Use the result of joining the `t2` and `t3` tables to delete the data in `t1`.
The deleted table only supports the UNIQUE model.
- 1. If `Error 1064 (HY000)` is returned, deletion fails
+```sql
+-- Create t1, t2, t3 tables
+CREATE TABLE t1
+ (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
+UNIQUE KEY (id)
+DISTRIBUTED BY HASH (id)
+PROPERTIES('replication_num'='1', "function_column.sequence_col" = "c4");
+
+CREATE TABLE t2
+ (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
+DISTRIBUTED BY HASH (id)
+PROPERTIES('replication_num'='1');
+
+CREATE TABLE t3
+ (id INT)
+DISTRIBUTED BY HASH (id)
+PROPERTIES('replication_num'='1');
+
+-- insert data
+INSERT INTO t1 VALUES
+ (1, 1, '1', 1.0, '2000-01-01'),
+ (2, 2, '2', 2.0, '2000-01-02'),
+ (3, 3, '3', 3.0, '2000-01-03');
+
+INSERT INTO t2 VALUES
+ (1, 10, '10', 10.0, '2000-01-10'),
+ (2, 20, '20', 20.0, '2000-01-20'),
+ (3, 30, '30', 30.0, '2000-01-30'),
+ (4, 4, '4', 4.0, '2000-01-04'),
+ (5, 5, '5', 5.0, '2000-01-05');
+
+INSERT INTO t3 VALUES
+ (1),
+ (4),
+ (5);
+
+-- remove rows from t1
+DELETE FROM t1
+ USING t2 INNER JOIN t3 ON t2.id = t3.id
+ WHERE t1.id = t2.id;
+```
- 2. If the returned result is `Query OK`, the deletion is successful
+The expected result is that the column with `id=1` in table `t1` is deleted.
- 1. If `status` is `committed`, the data deletion is committed and will
be eventually invisible. Users can wait for a while and then use the `show
delete` command to view the results.
- 2. If `status` is `visible`, the data have been deleted successfully.
+```Plain
++----+----+----+--------+------------+
+| id | c1 | c2 | c3 | c4 |
++----+----+----+--------+------------+
+| 2 | 2 | 2 | 2.0 | 2000-01-02 |
+| 3 | 3 | 3 | 3.0 | 2000-01-03 |
++----+----+----+--------+------------+
+```
-## Delete Operation Related FE Configuration
+## Returned Results
-**TIMEOUT Configuration**
+Delete command is a SQL command that return results synchronously. The results
are classified as follows:
-In general, Doris's deletion timeout is limited from 30 seconds to 5 minutes.
The specific time can be adjusted through the following configuration items
+### Implementation Success
-* `tablet_delete_timeout_second`
+If Delete completes successfully and is visible, the following results are
returned.`Query OK`indicates success.
- The timeout of delete itself can be elastically changed by the number of
tablets in the specified partition. This configuration represents the average
timeout contributed by a tablet. The default value is 2.
+```sql
+mysql delete from test_tbl PARTITION p1 where k1 = 1;
+Query OK, 0 rows affected (0.04 sec)
+{'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'VISIBLE',
'txnId':'4005'}
+```
- Assuming that there are 5 tablets under the specified partition for this
deletion, the timeout time available for the deletion is 10 seconds. Because
the minimum timeout is 30 seconds which is higher than former timeout time, the
final timeout is 30 seconds.
+### Submitted Successfully but Invisible
-* `load_straggler_wait_second`
+Doris transaction commit is divided into two steps: commit and release
version, only after the completion of the release version step, the results
will be visible to the user.
- If the user estimates a large amount of data, so that the upper limit of 5
minutes is insufficient, the user can adjust the upper limit of timeout through
this item, and the default value is 300.
+If the commit has been successful, then it can be assumed that it will
eventually be published successfully, Doris will try to wait for a certain
period of time after the commit is completed, if the timeout period is exceeded
even if the published version is not yet complete, it will be preferred to
return to the user, prompting the user that the commit has been completed.
- **The specific calculation rule of timeout(seconds)**
+ If Delete has been submitted and executed, but the release version is still
not published and visible, the following result will be returned:
- `TIMEOUT = MIN(load_straggler_wait_second, MAX(30,
tablet_delete_timeout_second * tablet_num))`
+```sql
+mysql delete from test_tbl PARTITION p1 where k1 = 1;
+Query OK, 0 rows affected (0.04 sec)
+{'label':'delete_e7830c72-eb14-4cb9-bbb6-eebd4511d251', 'status':'COMMITTED',
'txnId':'4005', 'err':'delete job is committed but may be taking effect later' }
+```
-* `query_timeout`
+The result will also return a json string:
- Because delete itself is an SQL command, the deletion statement is also
limited by the session variables, and the timeout is also affected by the
session value `query'timeout`. You can increase the value by `set query'timeout
= xxx`.
+- `affected rows`:Indicates the rows affected by this deletion. Since Doris
deletion is currently a logical deletion, this value is constant at 0;
-**InPredicate configuration**
+- `label`:The automatically generated label identifies the import job. Each
import job has a Label that is unique within a single database;
-* `max_allowed_in_element_num_of_delete`
+- `status`:Indicates whether the data deletion is visible. If it's visible,
the result displays `VISIBLE`; if it's invisible, the result displays
`COMMITTED`;
- If the user needs to take a lot of elements when using the in predicate, the
user can adjust the upper limit of the allowed in elements number, and the
default value is 1024.
+- `txnId`:The transaction id corresponding to Delete;
-## Show Delete History
+- `err`:This field will display the details of Delete.
-The user can view the deletion completed in history through the show delete
statement.
+### Commit Failed, Transaction Cancelled
-Syntax
+If the Delete statement fails to commit, the transaction will be automatically
aborted by Doris and the following result will be returned:
+
+```sql
+mysql delete from test_tbl partition p1 where k1 80;
+ERROR 1064 (HY000): errCode = 2, detailMessage = {Cause of error}
+```
+
+For example, a timeout deletion will return the timeout time and the
outstanding `(tablet=replica)`
+
+```sql
+mysql delete from test_tbl partition p1 where k1 80;
+ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas
from job: 4005, Unfinished replicas:10000=60000, 10001=60000, 10002=60000
+```
+
+### Summary
+
+The correct logic for handling the results returned by Delete is:
+
+- If returns `ERROR 1064 (HY000)` , the deletion failed;
+
+- If returns`Query OK`, the deletion is successful;
+
+ - If `status` is `COMMITTED`, it means that the data is still not visible,
users can wait for a while and then check the result with `show delete`;
+
+ - If `STATUS` is `VISIBLE`, the deletion is successful.
+
+## FE Configurations
+
+**TIMEOUT Configurations**
+
+总体来说,Doris 的删除作业的超时时间计算规则为如下(单位:秒):
+
+Overall, the timeout calculation rules for Doris Delete jobs are as follows
(in seconds):
+
+```Plain
+TIMEOUT = MIN(load_straggler_wait_second, MAX(30, tablet_delete_timeout_second
* tablet_num))
+```
+
+- `tablet_delete_timeout_second`
+
+The delete timeout time is elastically changed by the number of tablets under
the specified partition. This item is configured so that the default value of
the timeout time contributed by one tablet on average is 2.
+
+Assuming that there are 5 tablets under the partition specified for this
deletion, the timeout time available for delete is 10 seconds, and since it is
less than the minimum timeout time of 30 seconds, the final timeout time is 30
seconds.
+
+- `load_straggler_wait_second`
+
+If the user predicts a large amount of data, making the 5-minute limit
insufficient, the user can adjust the timeout limit via
load_straggler_wait_second, with a default value of 300.
+
+- `query_timeout`
+
+Because delete itself is a SQL command, the delete statement is also subject
to session limitations. Timeout is also affected by the `query_timeout` value
in the session, which can be increased in seconds by `SET query_timeout = xxx`.
+
+**IN Predicate Configuration**
+
+- `max_allowed_in_element_num_of_delete`
+
+If the user needs to occupy more elements when using the in predicate, the
user can adjust the maximum number of elements allowed to be carried by
`max_allowed_in_element_num_of_delete`. The default value is 1024.
+
+## View History
+
+Users can view the history of deletions that have been performed by using the
show delete statement.
+
+### Syntax
```sql
SHOW DELETE [FROM db_name]
```
-example
+### Example
```sql
-mysql> show delete from test_db;
+mysql show delete from test_db;
+-----------+---------------+---------------------+-----------------+----------+
| TableName | PartitionName | CreateTime | DeleteCondition | State
|
+-----------+---------------+---------------------+-----------------+----------+
@@ -150,12 +299,4 @@ mysql> show delete from test_db;
| test_tbl | p4 | 2020-04-15 23:09:53 | k1 GT "80" | FINISHED
|
+-----------+---------------+---------------------+-----------------+----------+
2 rows in set (0.00 sec)
-```
-
-### Note
-
-Unlike the Insert into command, delete cannot specify `label` manually. For
the concept of label, see the [Insert
Into](../import/import-way/insert-into-manual.md) documentation.
-
-## More Help
-
-For more detailed syntax used by **delete**, see the
[delete](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE.md)
command manual, You can also enter `HELP DELETE` in the Mysql client command
line to get more help information
+```
\ No newline at end of file
diff --git
a/versioned_docs/version-2.0/data-operate/delete/table-temp-partition.md
b/versioned_docs/version-2.0/data-operate/delete/table-temp-partition.md
index 59c326c0de4f..0a7aad869da4 100644
--- a/versioned_docs/version-2.0/data-operate/delete/table-temp-partition.md
+++ b/versioned_docs/version-2.0/data-operate/delete/table-temp-partition.md
@@ -24,34 +24,40 @@ specific language governing permissions and limitations
under the License.
-->
-# Temporary Partition
+Doris supports adding temporary partitions to partitioned tables. Temporary
partitions differ from formal partitions in that they are not queried by formal
queries, but only by special query statements.
-Since version 0.12, Doris supports temporary partitioning.
+- The partition columns of the temporary partition are the same as the formal
partition and cannot be modified.
-A temporary partition belongs to a partitioned table. Only partitioned tables
can create temporary partitions.
+- The partition ranges of all temporary partitions of a table cannot overlap,
but the ranges of temporary partitions and formal partitions can overlap.
-## Rules
+- The partition name of the temporary partition cannot be the same as the
formal partitions and other temporary partitions.
-* The partition columns of the temporary partition is the same as the formal
partition and cannot be modified.
-* The partition ranges of all temporary partitions of a table cannot overlap,
but the ranges of temporary partitions and formal partitions can overlap.
-* The partition name of the temporary partition cannot be the same as the
formal partitions and other temporary partitions.
+Temporary Partitions are mainly used in the following scenarios:
-## Supported operations
+- Atomic overwrite write operations
-The temporary partition supports add, delete, and replace operations.
+In some cases, users want to rewrite the data of a certain partition, but if
they do it by deleting and then importing, there will be a period of time in
the middle when they cannot view the data. In this case, the user can create a
corresponding Temporary Partition first, and after importing the new data into
the Temporary Partition, replace the original partition atomically through the
Replace operation to achieve the purpose. For the atomic overwrite write
operation of non-partitioned [...]
-### Add temporary partition
+- Modifying the number of buckets
+
+In some cases, users use an inappropriate bucket number when creating
partitions. Then the user can first create a Temporary Partition corresponding
to the partition range and specify the new bucket number. Then import the data
of the formal partition into the Temporary Partition through the `INSERT INTO`
command, and replace the original partition atomically through the Replace
operation to achieve the purpose.
+
+- Merge or split partition
+
+In some cases, users want to modify the scope of partitions, such as merging
two partitions, or splitting a large partition into several small partitions.
You can first create a Temporary Partition corresponding to the scope of the
merged or partitioned partition, and then import the data of the official
partition into the Temporary Partition through the `INSERT INTO` command, and
then replace the original partition atomically through the Replacement
operation, in order to achieve the purpose.
+
+## Add Temporary Partition
You can add temporary partitions to a table with the `ALTER TABLE ADD
TEMPORARY PARTITION` statement:
-```
-ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01");
+```sql
+ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01");
ALTER TABLE tbl2 ADD TEMPORARY PARTITION tp1 VALUES [("2020-01-01"),
("2020-02-01"));
-ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01")
+ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN("2020-02-01")
("replication_num" = "1")
-DISTRIBUTED BY HASH (k1) BUCKETS 5;
+DISTRIBUTED BY HASH(k1) BUCKETS 5;
ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai");
@@ -60,43 +66,43 @@ ALTER TABLE tbl4 ADD TEMPORARY PARTITION tp1 VALUES IN ((1,
"Beijing"), (1, "Sha
ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai")
("replication_num" = "1")
DISTRIBUTED BY HASH(k1) BUCKETS 5;
-
```
See `HELP ALTER TABLE;` for more help and examples.
Some instructions for adding operations:
-* Adding a temporary partition is similar to adding a formal partition. The
partition range of the temporary partition is independent of the formal
partition.
-* Temporary partition can independently specify some attributes. Includes
information such as the number of buckets, the number of replicas, or the
storage medium.
+- Adding a temporary partition is similar to adding a formal partition. The
partition range of the temporary partition is independent of the formal
partition.
+
+- Temporary partition can independently specify some attributes. Includes
information such as the number of buckets, the number of replicas, or the
storage medium.
-### Delete temporary partition
+### Delete Temporary Partition
A table's temporary partition can be dropped with the `ALTER TABLE DROP
TEMPORARY PARTITION` statement:
-```
+```Plain
ALTER TABLE tbl1 DROP TEMPORARY PARTITION tp1;
```
See `HELP ALTER TABLE;` for more help and examples.
-Some instructions for the delete operation:
+Some instructions for the DELETE:
-* Deleting the temporary partition will not affect the data of the formal
partition.
+- Deleting the temporary partition will not affect the data of the formal
partition.
-### Replace partition
+### Replace Partition
You can replace formal partitions of a table with temporary partitions with
the `ALTER TABLE REPLACE PARTITION` statement.
-```
+```sql
ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1,
tp2, tp3);
ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
PROPERTIES (
- "strict_range" = "false",
- "use_temp_partition_name" = "true"
+ "strict_range" = "false",
+ "use_temp_partition_name" = "true"
);
```
@@ -104,185 +110,148 @@ See `HELP ALTER TABLE;` for more help and examples.
The replace operation has two special optional parameters:
-1. `strict_range`
-
- The default is true.
-
- For Range partition, When this parameter is true, the range union of all
formal partitions to be replaced needs to be the same as the range union of the
temporary partitions to be replaced. When set to false, you only need to ensure
that the range between the new formal partitions does not overlap after
replacement.
-
- For List partition, this parameter is always true, and the enumeration
values of all full partitions to be replaced must be identical to the
enumeration values of the temporary partitions to be replaced.
-
- Here are some examples:
-
- * Example 1
-
- Range of partitions p1, p2, p3 to be replaced (=> union):
-
- ```
- (10, 20), [20, 30), [40, 50) => [10, 30), [40, 50)
- ```
-
- Replace the range of partitions tp1, tp2 (=> union):
-
- ```
- (10, 30), [40, 45), [45, 50) => [10, 30), [40, 50)
- ```
-
- The union of ranges is the same, so you can use tp1 and tp2 to replace
p1, p2, p3.
-
- * Example 2
-
- Range of partition p1 to be replaced (=> union):
-
- ```
- (10, 50) => [10, 50)
- ```
+**1. `strict_range`**
- Replace the range of partitions tp1, tp2 (=> union):
+The default is true.
- ```
- (10, 30), [40, 50) => [10, 30), [40, 50)
- ```
+For Range partition, When this parameter is true, the range union of all
formal partitions to be replaced needs to be the same as the range union of the
temporary partitions to be replaced. When set to false, you only need to ensure
that the range between the new formal partitions does not overlap after
replacement.
- The union of ranges is not the same. If `strict_range` is true, you
cannot use tp1 and tp2 to replace p1. If false, and the two partition ranges
`[10, 30), [40, 50)` and the other formal partitions do not overlap, they can
be replaced.
+For List partition, this parameter is always true, and the enumeration values
of all full partitions to be replaced must be identical to the enumeration
values of the temporary partitions to be replaced.
- * Example 3
+**Example 1**
- Enumerated values of partitions p1, p2 to be replaced (=> union).
+```sql
+-- Range of partitions p1, p2, p3 to be replaced (=> union):
+(10, 20), [20, 30), [40, 50) => [10, 30), [40, 50)
- ```
- (1, 2, 3), (4, 5, 6) => (1, 2, 3, 4, 5, 6)
- ```
+--Replace the range of partitions tp1, tp2 (=> union):
+(10, 30), [40, 45), [45, 50) => [10, 30), [40, 50)
- Replace the enumerated values of partitions tp1, tp2, tp3 (=> union).
-
- ```
- (1, 2, 3), (4), (5, 6) => (1, 2, 3, 4, 5, 6)
- ```
-
- The enumeration values are the same, you can use tp1, tp2, tp3 to
replace p1, p2
-
- * Example 4
-
- Enumerated values of partitions p1, p2, p3 to be replaced (=> union).
-
- ```
- (("1", "beijing"), ("1", "shanghai")), (("2", "beijing"), ("2",
"shanghai")), (("3", "beijing"), ("3", "shanghai")) => (("1", "beijing"), ("3",
"shanghai")) "), ("1", "shanghai"), ("2", "beijing"), ("2", "shanghai"), ("3",
"beijing"), ("3", "shanghai"))
- ```
-
- Replace the enumerated values of partitions tp1, tp2 (=> union).
+--The union of ranges is the same, so you can use tp1 and tp2 to replace p1,
p2, p3.
+```
- ```
- (("1", "beijing"), ("1", "shanghai")), (("2", "beijing"), ("2",
"shanghai"), ("3", "beijing"), ("3", "shanghai")) => (("1", "beijing") , ("1",
"shanghai"), ("2", "beijing"), ("2", "shanghai"), ("3", "beijing"), ("3",
"shanghai"))
- ```
+**Example 2**
- The enumeration values are the same, you can use tp1, tp2 to replace
p1, p2, p3
+```sql
+--Range of partition p1 to be replaced (=> union):
+[10, 50) => [10, 50)
-2. `use_temp_partition_name`
+--Replace the range of partitions tp1, tp2 (=> union):
+[10, 30), [40, 50) => [10, 30), [40, 50)
- The default is false. When this parameter is false, and the number of
partitions to be replaced is the same as the number of replacement partitions,
the name of the formal partition after the replacement remains unchanged. If
true, after replacement, the name of the formal partition is the name of the
replacement partition. Here are some examples:
+--The union of ranges is not the same. If strict_range is true, you cannot use
tp1 and tp2 to replace p1. If false, and the two partition ranges [10, 30),
[40, 50) and the other formal partitions do not overlap, they can be replaced.
+```
- * Example 1
+**Example 3**
- ```
- ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
- ```
+```sql
+--Enumerated values of partitions p1, p2 to be replaced (=> union).
+(1, 2, 3), (4, 5, 6) => (1, 2, 3, 4, 5, 6)
+--Replace the enumerated values of partitions tp1, tp2, tp3 (=> union).
+(1, 2, 3), (4), (5, 6) => (1, 2, 3, 4, 5, 6)
+--The enumeration values are the same, you can use tp1, tp2, tp3 to replace
p1, p2
+```
- `use_temp_partition_name` is false by default. After replacement, the
partition name is still p1, but the related data and attributes are replaced
with tp1.
+**Example 4**
- If `use_temp_partition_name` is true by default, the name of the
partition is tp1 after replacement. The p1 partition no longer exists.
+```sql
+--Enumerated values of partitions p1, p2, p3 to be replaced (=> union).
+(("1","beijing"), ("1", "shanghai")), (("2","beijing"), ("2", "shanghai")),
(("3","beijing"), ("3", "shanghai")) => (("1","beijing"), ("1", "shanghai"),
("2","beijing"), ("2", "shanghai"), ("3","beijing"), ("3", "shanghai"))
- * Example 2
+--Replace the enumerated values of partitions tp1, tp2 (=> union).
+(("1","beijing"), ("1", "shanghai")), (("2","beijing"), ("2", "shanghai"),
("3","beijing"), ("3", "shanghai")) => (("1","beijing"), ("1", "shanghai"),
("2","beijing"), ("2", "shanghai"), ("3","beijing"), ("3", "shanghai"))
- ```
- ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION
(tp1);
- ```
+--The enumeration values are the same, you can use tp1, tp2 to replace p1, p2,
p3
+```
- `use_temp_partition_name` is false by default, but this parameter is
invalid because the number of partitions to be replaced and the number of
replacement partitions are different. After the replacement, the partition name
is tp1, and p1 and p2 no longer exist.
+**2. `use_temp_partition_name`**
-Some instructions for the replacement operation:
+The default is false.
-* After the partition is replaced successfully, the replaced partition will be
deleted and cannot be recovered.
+When this parameter is false, and the number of partitions to be replaced is
the same as the number of replacement partitions, the name of the formal
partition after the replacement remains unchanged.
-## Load and query of temporary partitions
+If true, after replacement, the name of the formal partition is the name of
the replacement partition. Here are some examples:
-Users can load data into temporary partitions or specify temporary partitions
for querying.
+**Example 1**
-1. Load temporary partition
+```sql
+ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
+```
- The syntax for specifying a temporary partition is slightly different
depending on the load method. Here is a simple illustration through an example:
+`use_temp_partition_name` is **false** by default. After replacement, the
partition name is still p1, but the related data and attributes are replaced
with tp1.
- ```
- INSERT INTO tbl TEMPORARY PARTITION (tp1, tp2, ...) SELECT ....
- ```
+If `use_temp_partition_name` is **true** by default, the name of the partition
is tp1 after replacement. The p1 partition no longer exists.
- ```
- curl --location-trusted -u root: -H "label: 123" -H "temporary_partition:
tp1, tp2, ..." -T testData http: // host: port / api / testDb / testTbl /
_stream_load
- ```
+**Example 2**
- ```
- LOAD LABEL example_db.label1
- (
- DATA INFILE ("hdfs: // hdfs_host: hdfs_port / user / palo / data / input /
file")
- INTO TABLE `my_table`
- TEMPORARY PARTITION (tp1, tp2, ...)
- ...
- )
- WITH BROKER hdfs ("username" = "hdfs_user", "password" = "hdfs_password");
- ```
+```Plain
+ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1);
+```
- ```
- CREATE ROUTINE LOAD example_db.test1 ON example_tbl
- COLUMNS (k1, k2, k3, v1, v2, v3 = k1 * 100),
- TEMPORARY PARTITIONS (tp1, tp2, ...),
- WHERE k1> 100
- PROPERTIES
- (...)
- FROM KAFKA
- (...);
- ```
+`use_temp_partition_name` is **false** by default, but this parameter is
invalid because the number of partitions to be replaced and the number of
replacement partitions are different. After the replacement, the partition name
is tp1, and p1 and p2 no longer exist.
+
+:::tip Tip
+**Some instructions for the replacement operation:**
+
+After the partition is replaced successfully, the replaced partition will be
deleted and cannot be recovered.
+:::
+
+## Importing Temporary Partitions
+
+Depending on the import method, the syntax for specifying the import of
Temporary Partitions is slightly different. Here is a brief explanation with an
example:
+
+```sql
+INSERT INTO tbl TEMPORARY PARTITION(tp1, tp2, ...) SELECT ....
+curl --location-trusted -u root: -H "label:123" -H "temporary_partitions: tp1,
tp2, ..." -T testData http://host:port/api/testDb/testTbl/_stream_load
+LOAD LABEL example_db.label1
+(
+DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
+INTO TABLE my_table
+TEMPORARY PARTITION (tp1, tp2, ...)
+...
+)
+WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password");
+CREATE ROUTINE LOAD example_db.test1 ON example_tbl
+COLUMNS(k1, k2, k3, v1, v2, v3 = k1 * 100),
+TEMPORARY PARTITIONS(tp1, tp2, ...),
+WHERE k1 > 100
+PROPERTIES
+(...)
+FROM KAFKA
+(...);
+```
-2. Query the temporary partition
+## Query Temporary Partitions
- ```
- SELECT ... FROM
- tbl1 TEMPORARY PARTITION (tp1, tp2, ...)
- JOIN
- tbl2 TEMPORARY PARTITION (tp1, tp2, ...)
- ON ...
- WHERE ...;
- ```
+```sql
+SELECT ... FROM
+tbl1 TEMPORARY PARTITION(tp1, tp2, ...)
+JOIN
+tbl2 TEMPORARY PARTITION(tp1, tp2, ...)
+ON ...
+WHERE ...;
+```
-## Relationship to other operations
+## Relationship to Other Operations
### DROP
-* After using the `DROP` operation to directly drop the database or table, you
can recover the database or table (within a limited time) through the `RECOVER`
command, but the temporary partition will not be recovered.
-* After the formal partition is dropped using the `ALTER` command, the
partition can be recovered by the `RECOVER` command (within a limited time).
Operating a formal partition is not related to a temporary partition.
-* After the temporary partition is dropped using the `ALTER` command, the
temporary partition cannot be recovered through the `RECOVER` command.
+- After using the `DROP` operation to directly drop the database or table, you
can recover the database or table (within a limited time) through the `RECOVER`
command, but the temporary partition will not be recovered.
-### TRUNCATE
-
-* Use the `TRUNCATE` command to empty the table. The temporary partition of
the table will be deleted and cannot be recovered.
-* When using `TRUNCATE` command to empty the formal partition, it will not
affect the temporary partition.
-* You cannot use the `TRUNCATE` command to empty the temporary partition.
+- After the formal partition is dropped using the `ALTER` command, the
partition can be recovered by the `RECOVER` command (within a limited time).
Operating a formal partition is not related to a temporary partition.
-### ALTER
+- After the temporary partition is dropped using the `ALTER` command, the
temporary partition cannot be recovered through the `RECOVER` command.
-* When the table has a temporary partition, you cannot use the `ALTER` command
to perform Schema Change, Rollup, etc. on the table.
-* You cannot add temporary partitions to a table while the table is undergoing
a alter operation.
+### TRUNCATE
+- Use the `TRUNCATE` command to empty the table. The temporary partition of
the table will be deleted and cannot be recovered.
-## Best Practices
+- When using `TRUNCATE` command to empty the formal partition, it will not
affect the temporary partition.
-1. Atomic overwrite
+- You cannot use the `TRUNCATE` command to empty the temporary partition.
- In some cases, the user wants to be able to rewrite the data of a certain
partition, but if it is dropped first and then loaded, there will be a period
of time when the data cannot be seen. At this moment, the user can first create
a corresponding temporary partition, load new data into the temporary
partition, and then replace the original partition atomically through the
`REPLACE` operation to achieve the purpose. For atomic overwrite operations of
non-partitioned tables, please re [...]
-
-2. Modify the number of buckets
+### ALTER
- In some cases, the user used an inappropriate number of buckets when
creating a partition. The user can first create a temporary partition
corresponding to the partition range and specify a new number of buckets. Then
use the `INSERT INTO` command to load the data of the formal partition into the
temporary partition. Through the replacement operation, the original partition
is replaced atomically to achieve the purpose.
-
-3. Merge or split partitions
+- When the table has a temporary partition, you cannot use the `ALTER` command
to perform Schema Change, Rollup, etc. on the table.
- In some cases, users want to modify the range of partitions, such as
merging two partitions, or splitting a large partition into multiple smaller
partitions. Then the user can first create temporary partitions corresponding
to the merged or divided range, and then load the data of the formal partition
into the temporary partition through the `INSERT INTO` command. Through the
replacement operation, the original partition is replaced atomically to achieve
the purpose.
+- You cannot add temporary partitions to a table while the table is undergoing
a alter operation.
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/data-operate/export/export-manual.md
b/versioned_docs/version-2.0/data-operate/export/export-manual.md
index 2b030079bce4..3aa3bcd97b21 100644
--- a/versioned_docs/version-2.0/data-operate/export/export-manual.md
+++ b/versioned_docs/version-2.0/data-operate/export/export-manual.md
@@ -24,9 +24,9 @@ specific language governing permissions and limitations
under the License.
-->
-# Export Overview
- `Export` is a feature provided by Doris that allows for the asynchronous
export of data. This feature allows the user to export the data of specified
tables or partitions in a specified file format through the Broker process or
S3 protocol/ HDFS protocol, to remote storage such as object storage or HDFS.
+
+`Export` is a feature provided by Doris that allows for the asynchronous
export of data. This feature allows the user to export the data of specified
tables or partitions in a specified file format through the Broker process or
S3 protocol/ HDFS protocol, to remote storage such as object storage or HDFS.
Currently, `EXPORT` supports exporting Doris local tables / views / external
tables and supports exporting to file formats including parquet, orc, csv,
csv_with_names, and csv_with_names_and_types.
@@ -39,14 +39,16 @@ After a user submits an `Export Job`, Doris will calculate
all the Tablets invol
The overall execution process is as follows:
1. The user submits an Export job to FE.
+
2. FE calculates all the tablets to be exported and groups them based on the
`parallelism` parameter. Each group generates multiple `SELECT INTO OUTFILE`
query plans based on the `maximum_number_of_export_partitions` parameter.
3. Based on the parallelism parameter, an equal number of `ExportTaskExecutor`
are generated, and each `ExportTaskExecutor` is responsible for a thread, which
is scheduled and executed by FE's `Job scheduler` framework.
+
4. FE's `Job scheduler` schedules and executes the `ExportTaskExecutor`, and
each `ExportTaskExecutor` serially executes the multiple `SELECT INTO OUTFILE`
query plans it is responsible for.
## Start Export
-For detailed usage of Export, please refer to
[EXPORT](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md).
+For detailed usage of Export, please refer to
[EXPORT](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT).
Export's detailed commands can be passed through `HELP EXPORT;` in mysql
client. Examples are as follows:
@@ -72,9 +74,13 @@ WITH BROKER "hdfs"
```
* `label`: The identifier of this export job. You can use this identifier to
view the job status later.
+
* `column_separator`: Column separator. The default is `\t`. Supports
invisible characters, such as'\x07'.
+
* `column`: columns to be exported, separated by commas, if this parameter is
not filled in, all columns of the table will be exported by default.
+
* `line_delimiter`: Line separator. The default is `\n`. Supports invisible
characters, such as'\x07'.
+
* `parallelusm`:Exporting with 3 concurrent threads.
### Export to Object Storage (Supports S3 Protocol)
@@ -90,12 +96,14 @@ WITH S3 (
```
- `s3.access_key`/`s3.secret_key`:Is your key to access the object storage API.
+
- `s3.endpoint`:Endpoint indicates the access domain name of object storage
external services.
+
- `s3.region`:Region indicates the region where the object storage data center
is located.
### View Export Status
-After submitting a job, the job status can be viewed by querying the [SHOW
EXPORT](../../sql-manual/sql-reference/Show-Statements/SHOW-EXPORT.md)
command. The results are as follows:
+After submitting a job, the job status can be viewed by querying the [SHOW
EXPORT](../../sql-manual/sql-reference/Show-Statements/SHOW-EXPORT.md)
command. The results are as follows:
```sql
mysql> show EXPORT\G;
@@ -125,33 +133,55 @@ FinishTime: 2019-06-25 17:08:34
* JobId: The unique ID of the job
+
* State: Job status:
+
* PENDING: Jobs to be Scheduled
+
* EXPORTING: Data Export
+
* FINISHED: Operation Successful
+
* CANCELLED: Job Failure
+
* Progress: Work progress. The schedule is based on the query plan. Assuming
there are 10 threads in total and 3 have been completed, the progress will be
30%.
+
* TaskInfo: Job information in Json format:
+
* db: database name
+
* tbl: Table name
+
* partitions: Specify the exported partition. `empty` Represents all
partitions.
- * column separator: The column separator for the exported file.
- * line delimiter: The line separator for the exported file.
- * tablet num: The total number of tablets involved.
- * Broker: The name of the broker used.
+
+ * column separator: The column separator for the exported file.
+
+ * line delimiter: The line separator for the exported file.
+
+ * tablet num: The total number of tablets involved.
+
+ * Broker: The name of the broker used.
+
* max_file_size: The maximum size of an export file.
+
* delete_existing_files: Whether to delete existing files and directories
in the specified export directory.
+
* columns: Specifies the column names to be exported. Empty values
represent exporting all columns.
+
* format: The file format for export.
+
* Path: Export path on remote storage.
+
* CreateTime/StartTime/FinishTime: Creation time, start scheduling time and
end time of jobs.
+
* Timeout: Job timeout. The unit is seconds. This time is calculated from
CreateTime.
+
* Error Msg: If there is an error in the job, the cause of the error is shown
here.
-* OutfileInfo:If the export job is successful, specific `SELECT INTO OUTFILE`
result information will be displayed here.
+
+* OutfileInfo: If the export job is successful, specific `SELECT INTO OUTFILE`
result information will be displayed here.
### Cancel Export Job
-<version since="dev"></version>
After submitting a job, the job can be canceled by using the [CANCEL
EXPORT](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/CANCEL-EXPORT.md)
command. For example:
@@ -159,7 +189,7 @@ After submitting a job, the job can be canceled by using
the [CANCEL EXPORT](..
CANCEL EXPORT
FROM example_db
WHERE LABEL like "%example%";
-````
+```
## Best Practices
@@ -172,6 +202,7 @@ The underlying execution logic of an `Export Job `is
actually the `SELECT INTO O
The specific logic for splitting an `Export Job` into multiple `SELECT INTO
OUTFILE` is, to evenly distribute all the tablets of the table among all
parallel threads. For example:
- If num(tablets) = 40 and parallelism = 3, then the three threads will be
responsible for 14, 13, and 13 tablets, respectively.
+
- If num(tablets) = 2 and parallelism = 3, then Doris automatically sets the
parallelism to 2, and each thread is responsible for one tablet.
When the number of tablets responsible for a thread exceeds the
`maximum_tablets_of_outfile_in_export` value (default is 10, and can be
modified by adding the `maximum_tablets_of_outfile_in_export` parameter in
fe.conf), the thread will split the tablets which are responsibled for this
thread into multiple `SELECT INTO OUTFILE` statements. For example:
@@ -187,15 +218,25 @@ However, in certain scenarios, such as a query plan that
requires scanning too m
## Notes
* It is not recommended to export large amounts of data at one time. The
maximum amount of exported data recommended by an Export job is tens of GB.
Excessive export results in more junk files and higher retry costs.
+
* If the amount of table data is too large, it is recommended to export it by
partition.
+
* During the operation of the Export job, if FE restarts or cuts the master,
the Export job will fail, requiring the user to resubmit.
+
* If the Export job fails, the temporary files and directory generated in the
remote storage will not be deleted, requiring the user to delete them manually.
+
* Export jobs scan data and occupy IO resources, which may affect the query
latency of the system.
+
* The Export job can export data from `Doris Base tables`, `View`, and
`External tables`, but not from `Rollup Index`.
+
* When using the EXPORT command, please ensure that the target path exists,
otherwise the export may fail.
+
* When concurrent export is enabled, please configure the thread count and
parallelism appropriately to fully utilize system resources and avoid
performance bottlenecks.
+
* When exporting to a local file, pay attention to file permissions and the
path, ensure that you have sufficient permissions to write, and follow the
appropriate file system path.
+
* It is possible to monitor progress and performance metrics in real-time
during the export process to identify issues promptly and make optimal
adjustments.
+
* It is recommended to verify the integrity and accuracy of the exported data
after the export operation is completed to ensure the quality and integrity of
the data.
## Relevant configuration
@@ -206,7 +247,7 @@ However, in certain scenarios, such as a query plan that
requires scanning too m
## More Help
-For more detailed syntax and best practices used by Export, please refer to
the
[Export](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md)
command manual, You can also enter `HELP EXPORT` at the command line of the
MySql client for more help.
+For more detailed syntax and best practices used by Export, please refer to
the
[Export](../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT)
command manual, You can also enter `HELP EXPORT` at the command line of the
MySql client for more help.
-The underlying implementation of the `EXPORT` command is the `SELECT INTO
OUTFILE` statement. For more information about SELECT INTO OUTFILE, please
refer to [Export Query Result](./outfile.md) and [SELECT INTO
OUTFILE](../..//sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md).
+The underlying implementation of the `EXPORT` command is the `SELECT INTO
OUTFILE` statement. For more information about SELECT INTO OUTFILE, please
refer to [Export Query Result](../export/outfile) and [SELECT INTO
OUTFILE](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE).
diff --git
a/versioned_docs/version-2.0/data-operate/export/export-with-mysql-dump.md
b/versioned_docs/version-2.0/data-operate/export/export-with-mysql-dump.md
index 2ce908a1d864..eff49d30cd42 100644
--- a/versioned_docs/version-2.0/data-operate/export/export-with-mysql-dump.md
+++ b/versioned_docs/version-2.0/data-operate/export/export-with-mysql-dump.md
@@ -24,18 +24,26 @@ specific language governing permissions and limitations
under the License.
-->
-# Export Data or Table Structure with MYSQLDUMP
+
Doris has supported exporting data or table structures through the `mysqldump`
tool after version 0.15
## Examples
+
### Export
- 1. Export the table1 table in the test database: `mysqldump -h127.0.0.1
-P9030 -uroot --no-tablespaces --databases test --tables table1`
- 2. Export the table1 table structure in the test database: `mysqldump
-h127.0.0.1 -P9030 -uroot --no-tablespaces --databases test --tables table1
--no-data`
- 3. Export all tables in the test1, test2 database: `mysqldump -h127.0.0.1
-P9030 -uroot --no-tablespaces --databases test1 test2`
- 4. Export all databases and tables `mysqldump -h127.0.0.1 -P9030 -uroot
--no-tablespaces --all-databases`
+1. Export the table1 table in the test database: `mysqldump -h127.0.0.1 -P9030
-uroot --no-tablespaces --databases test --tables table1`
+
+2. Export the table1 table structure in the test database: `mysqldump
-h127.0.0.1 -P9030 -uroot --no-tablespaces --databases test --tables table1
--no-data`
+
+3. Export all tables in the test1, test2 database: `mysqldump -h127.0.0.1
-P9030 -uroot --no-tablespaces --databases test1 test2`
+
+4. Export all databases and tables `mysqldump -h127.0.0.1 -P9030 -uroot
--no-tablespaces --all-databases`
For more usage parameters, please refer to the manual of `mysqldump`
+
### Import
The results exported by `mysqldump` can be redirected to a file, which can
then be imported into Doris through the source command `source filename.sql`
+
## Notice
+
1. Since there is no concept of tablespace in mysql in Doris, add the
`--no-tablespaces` parameter when using `mysqldump`
+
2. Using mysqldump to export data and table structure is only used for
development and testing or when the amount of data is small. Do not use it in a
production environment with a large amount of data.
diff --git a/versioned_docs/version-2.0/data-operate/export/outfile.md
b/versioned_docs/version-2.0/data-operate/export/outfile.md
index f3a4d9b8a813..d70f6d7abea6 100644
--- a/versioned_docs/version-2.0/data-operate/export/outfile.md
+++ b/versioned_docs/version-2.0/data-operate/export/outfile.md
@@ -1,6 +1,6 @@
---
{
- "title": "Export Query Result",
+ "title": "Export Query Result with SELECT INTO OUTFILE",
"language": "en"
}
---
@@ -24,9 +24,9 @@ specific language governing permissions and limitations
under the License.
-->
-# Export Query Result
-This document describes how to use the [SELECT INTO
OUTFILE](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md)
command to export query results.
+
+This document describes how to use the [SELECT INTO
OUTFILE](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE)
command to export query results.
`SELECT INTO OUTFILE` is a synchronous command, which means that the operation
is completed when the command returns. It also returns a row of results to show
the execution result of the export.
@@ -57,14 +57,16 @@ select * from tbl1 limit 10
INTO OUTFILE "file:///home/work/path/result_";
```
-For more usage, see [OUTFILE
documentation](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md).
+For more usage, see [OUTFILE
documentation](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE).
## Concurrent Export
By default, the export of the query result set is non-concurrent, that is, a
single point of export. If the user wants the query result set to be exported
concurrently, the following conditions need to be met:
1. session variable 'enable_parallel_outfile' to enable concurrent export:
```set enable_parallel_outfile = true;```
+
2. The export method is S3, HDFS instead of using a broker
+
3. The query can meet the needs of concurrent export, for example, the top
level does not contain single point nodes such as sort. (I will give an example
later, which is a query that does not export the result set concurrently)
If the above three conditions are met, the concurrent export query result set
can be triggered. Concurrency = ```be_instacne_num *
parallel_fragment_exec_instance_num```
@@ -108,7 +110,7 @@ Planning example for concurrent export:
## Usage Examples
-For details, please refer to [OUTFILE
Document](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md).
+For details, please refer to [OUTFILE
Document](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE).
## Return Results
@@ -128,8 +130,11 @@ mysql> select * from tbl1 limit 10 into outfile
"file:///home/work/path/result_"
```
* FileNumber: The number of files finally generated.
+
* TotalRows: The number of rows in the result set.
+
* FileSize: The total size of the exported file. Unit byte.
+
* URL: If it is exported to a local disk, the Compute Node to which it is
exported is displayed here.
If a concurrent export is performed, multiple rows of data will be returned.
@@ -146,7 +151,7 @@ If a concurrent export is performed, multiple rows of data
will be returned.
If the execution is incorrect, an error message will be returned, such as:
-```
+```sql
mysql> SELECT * FROM tbl INTO OUTFILE ...
ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
```
@@ -154,15 +159,23 @@ ERROR 1064 (HY000): errCode = 2, detailMessage = Open
broker writer failed ...
## Notice
* The CSV format does not support exporting binary types, such as BITMAP and
HLL types. These types will be output as `\N`, which is null.
+
* If you do not enable concurrent export, the query result is exported by a
single BE node in a single thread. Therefore, the export time and the export
result set size are positively correlated. Turning on concurrent export can
reduce the export time.
+
* The export command does not check whether the file and file path exist.
Whether the path will be automatically created or whether the existing file
will be overwritten is entirely determined by the semantics of the remote
storage system.
+
* If an error occurs during the export process, the exported file may remain
on the remote storage system. Doris will not clean these files. The user needs
to manually clean up.
+
* The timeout of the export command is the same as the timeout of the query.
It can be set by `SET query_timeout = xxx`.
+
* For empty result query, there will be an empty file.
+
* File spliting will ensure that a row of data is stored in a single file.
Therefore, the size of the file is not strictly equal to `max_file_size`.
+
* For functions whose output is invisible characters, such as BITMAP and HLL
types, the output is `\N`, which is NULL.
+
* At present, the output type of some geo functions, such as `ST_Point` is
VARCHAR, but the actual output value is an encoded binary character. Currently
these functions will output garbled characters. For geo functions, use
`ST_AsText` for output.
## More Help
-For more detailed syntax and best practices for using OUTFILE, please refer to
the
[OUTFILE](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md)
command manual, you can also More help information can be obtained by typing
`HELP OUTFILE` at the command line of the MySql client.
+For more detailed syntax and best practices for using OUTFILE, please refer to
the
[OUTFILE](../../sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE)
command manual, you can also More help information can be obtained by typing
`HELP OUTFILE` at the command line of the MySql client.
diff --git
a/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md
b/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md
index 520aee4fa9f9..89d3c70d1712 100644
---
a/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md
+++
b/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md
@@ -1,6 +1,6 @@
---
{
- "title": "Compilation with Docker",
+ "title": "Compilation with Docker(Recommended)",
"language": "en"
}
---
@@ -25,7 +25,6 @@ under the License.
-->
-# Compile with Docker (Recommended)
This guide is about how to compile Doris using the official compilation image
provided. As this image is maintained by the Apache Doris team and is regularly
updated with the necessary dependencies, it is the recommended way of
compilation for users.
diff --git
a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
index 7526fe78514c..64ef2611562a 100644
---
a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
+++
b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
@@ -1,6 +1,6 @@
---
{
- "title": "Compiling with LDB Toolchain",
+ "title": "Compiling with LDB Toolchain (Recommended)",
"language": "en"
}
---
@@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->
-# Compile with LDB-Toolchain (Recommended)
+
This guide is about how to compile Doris using the LDB Toolchain. This method
serves as a supplement to the Docker compilation approach to help the
developers and users without a Docker environment. The recommended LDB
Toolchain version is 0.17, which includes clang-16 and gcc-11.
diff --git a/versioned_docs/version-2.0/lakehouse/file.md
b/versioned_docs/version-2.0/lakehouse/file.md
index cc78338ee423..d5b4b7fdc224 100644
--- a/versioned_docs/version-2.0/lakehouse/file.md
+++ b/versioned_docs/version-2.0/lakehouse/file.md
@@ -1,6 +1,6 @@
---
{
- "title": "File Analysis",
+ "title": "Table Value Function (TVF)",
"language": "en"
}
---
@@ -25,7 +25,6 @@ under the License.
-->
-# File Analysis
With the Table Value Function feature, Doris is able to query files in object
storage or HDFS as simply as querying Tables. In addition, it supports
automatic column type inference.
@@ -34,13 +33,14 @@ With the Table Value Function feature, Doris is able to
query files in object st
For more usage details, please see the documentation:
*
[S3](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/s3/):
supports file analysis on object storage compatible with S3
+
*
[HDFS](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/hdfs/):
supports file analysis on HDFS
The followings illustrate how file analysis is conducted with the example of
S3 Table Value Function.
### Automatic Column Type Inference
-```
+```sql
> DESC FUNCTION s3 (
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"s3.access_key"= "ak",
@@ -65,7 +65,7 @@ The followings illustrate how file analysis is conducted with
the example of S3
An S3 Table Value Function is defined as follows:
-```
+```sql
s3(
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"s3.access_key"= "ak",
@@ -111,7 +111,7 @@ Currently supported column types are:
Example:
-```
+```sql
s3 (
"uri" = "https://bucket1/inventory.dat",
"s3.access_key"= "ak",
@@ -127,7 +127,7 @@ s3 (
You can conduct queries and analysis on this Parquet file using any SQL
statements:
-```
+```sql
SELECT * FROM s3(
"uri" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"s3.access_key"= "ak",
@@ -148,10 +148,9 @@ LIMIT 5;
You can put the Table Value Function anywhere that you used to put Table in
the SQL, such as in the WITH or FROM clause in CTE. In this way, you can treat
the file as a normal table and conduct analysis conveniently.
-你也可以用过 `CREATE VIEW` 语句为 Table Value Function 创建一个逻辑视图。这样,你可以想其他视图一样,对这个 Table
Value Function 进行访问、权限管理等操作,也可以让其他用户访问这个 Table Value Function。
You can also create a logic view by using `CREATE VIEW` statement for a Table
Value Function. So that you can query this view, grant priv on this view or
allow other user to access this Table Value Function.
-```
+```sql
CREATE VIEW v1 AS
SELECT * FROM s3(
"uri" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
@@ -171,7 +170,7 @@ GRANT SELECT_PRIV ON db1.v1 TO user1;
Users can ingest files into Doris tables via `INSERT INTO SELECT` for faster
file analysis:
-```
+```sql
// 1. Create Doris internal table
CREATE TABLE IF NOT EXISTS test_table
(
diff --git a/versioned_docs/version-2.0/query/query-data/encryption-function.md
b/versioned_docs/version-2.0/query/query-data/encryption-function.md
index d35c4da69469..6612cdd7da38 100644
--- a/versioned_docs/version-2.0/query/query-data/encryption-function.md
+++ b/versioned_docs/version-2.0/query/query-data/encryption-function.md
@@ -1,6 +1,6 @@
---
{
- "title": "Encrption Function",
+ "title": "Encrption and Masking Function ",
"language": "en"
}
---
diff --git
a/versioned_docs/version-2.0/query/udf/remote-user-defined-function.md
b/versioned_docs/version-2.0/query/udf/remote-user-defined-function.md
index 5ccd362728e8..d9ccd311c94b 100644
--- a/versioned_docs/version-2.0/query/udf/remote-user-defined-function.md
+++ b/versioned_docs/version-2.0/query/udf/remote-user-defined-function.md
@@ -24,54 +24,79 @@ specific language governing permissions and limitations
under the License.
-->
-# User Defined Function Rpc
-Remote UDF Service The Remote UDF Service can be accessed through RPC to
implement the execution of user-defined functions. Compared with Native UDF
implementations, Remote UDF Service has the following advantages and
limitations:
-1. The advantage
- * Cross-language: UDF services can be written in all languages supported by
Protobuf.
- * Security: UDF execution failure or crash only affects the UDF Service and
does not cause the Doris process to crash.
- * Flexibility: Any other Service or library class can be invoked within a
UDF Service to meet a wider variety of business requirements.
+## Remote UDF
-2. Restrictions on use
- * Performance: Compared to Native UDFs, UDF services incur extra network
overhead and thus have much lower performance than Native UDFs. At the same
time, the implementation of the UDF Service also affects the execution
efficiency of the function. Users need to deal with problems such as high
concurrency and thread safety by themselves.
- * Single line mode and batch mode: Doris's original query execution
framework based on row memory would execute one UDF RPC call for each row of
data, so the execution efficiency was very poor. However, under the new
vectorization execution framework, one UDF RPC call would be executed for each
batch of data (2048 rows by default), so the performance was significantly
improved. In actual tests, the performance of Remote UDF based on vectorization
and batch processing is similar to tha [...]
+Remote UDF Service supports accessing user-provided UDF Services via RPC to
execute user-defined functions. Compared to native UDF implementation, Remote
UDF Service has the following advantages and limitations:
-## Write UDF functions
+**1. Advantages**
-This section describes how to develop a Remote RPC Service. Samples for the
Java version are provided under `samples/doris-demo/udf-demo/` for your
reference.
+* Cross-language: UDF Services can be written in various languages supported
by Protobuf.
-### Copy the proto file
+* Security: UDF failures or crashes only affect the UDF Service itself and do
not cause Doris process crashes.
-Copy gensrc/proto/function_service.proto and gensrc/proto/types.proto to Rpc
service
+* Flexibility: UDF Services can invoke any other services or library classes
to meet diverse business requirements.
-- function_service.proto
- - PFunctionCallRequest
- - function_name:The function name, corresponding to the symbol specified
when the function was created
- - args:The parameters passed by the method
- - context:Querying context Information
- - PFunctionCallResponse
- - result:Return result
- - status:Return Status, 0 indicates normal
- - PCheckFunctionRequest
- - function:Function related information
- - match_type:Matching type
- - PCheckFunctionResponse
- - status:Return status, 0 indicates normal
+**2. Usage Limitations**
-### Generated interface
+* Performance: Compared to native UDFs, UDF Service introduces additional
network overhead, resulting in lower performance. Additionally, the UDF Service
implementation itself can impact function execution efficiency, and users need
to handle issues like high concurrency and thread safety.
-Use protoc generate code, and specific parameters are viewed using protoc -h
+* Single-row mode and batch processing mode: In Doris' original row-based
query execution framework, UDF RPC calls are made for each row of data,
resulting in poor performance. However, in the new vectorized execution
framework, UDF RPC calls are made for each batch of data (default: 2048 rows),
leading to significant performance improvements. In actual testing, the
performance of Remote UDF based on vectorization and batch processing is
comparable to that of native UDF based on row storage.
-### Implementing an interface
+## Writing UDF Functions
-The following three methods need to be implemented
-- fnCall:Used to write computational logic
-- checkFn:Used to verify function names, parameters, and return values when
creating UDFs
-- handShake:Used for interface probe
+This section provides instructions on how to develop a Remote RPC service. A
Java version example is provided in `samples/doris-demo/udf-demo/` for
reference.
-## Create UDF
+### Copying the Proto Files
-Currently UDTF are not supported
+Copy `gensrc/proto/function_service.proto` and `gensrc/proto/types.proto` to
the RPC service.
+
+**function_service.proto**
+
+- PFunctionCallRequest
+
+ - function_name: Function name, corresponding to the symbol specified
during function creation.
+
+ - args: Arguments passed to the method.
+
+ - context: Query context information.
+
+- PFunctionCallResponse
+
+ - result: Result.
+
+ - status: Status, where 0 represents normal.
+
+- PCheckFunctionRequest
+
+ - function: Function-related information.
+
+ - match_type: Matching type.
+
+- PCheckFunctionResponse
+
+ - status: Status, where 0 represents normal.
+
+### Generating Interfaces
+
+Generate code using protoc. Refer to `protoc -h` for specific parameters.
+
+### Implementing Interfaces
+
+The following three methods need to be implemented:
+
+- fnCall: Used to write the calculation logic.
+
+- checkFn: Used for UDF creation validation, checking if the function name,
parameters, return values, etc., are valid.
+
+- handShake: Used for interface probing.
+
+
+---
+
+## Creating UDF
+
+Currently, UDTF is not supported.
```sql
CREATE FUNCTION
@@ -79,14 +104,19 @@ name ([,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
```
-Instructions:
-1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC
call, which must be set。
-2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a
single address and a cluster address in BRPC-compatible format are supported.
Refer to the cluster connection mode[Format
specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4)。
-3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default.
Rpc is transmitted when Rpc UDF is used。
-4. name: A function belongs to a DB and name is of the
form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of
the current session is used`dbName`。
+Note:
+
+1. The `symbol` in the PROPERTIES represents the method name passed in the RPC
call, and this parameter must be set.
+
+2. The `object_file` in the PROPERTIES represents the RPC service address.
Currently, it supports a single address and cluster addresses in the
brpc-compatible format. For cluster connection methods, refer to the [Format
Specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4)
(Chinese).
+
+3. The `type` in the PROPERTIES represents the UDF invocation type, which is
set to Native by default. Use RPC to pass when using RPC UDF.
+
+4. `name`: A function belongs to a specific database. The name is in the form
of `dbName`.`funcName`. When `dbName` is not explicitly specified, the current
session's database is used as `dbName`.
+
+Example:
-Sample:
```sql
CREATE FUNCTION rpc_add_two(INT,INT) RETURNS INT PROPERTIES (
"SYMBOL"="add_int_two",
@@ -105,20 +135,22 @@ CREATE FUNCTION rpc_add_string(varchar(30)) RETURNS
varchar(30) PROPERTIES (
);
```
-## Use UDF
+## Using UDF
-Users must have the `SELECT` permission of the corresponding database to use
UDF/UDAF.
+Users must have the `SELECT` privilege on the corresponding database to use
UDF.
-The use of UDF is consistent with ordinary function methods. The only
difference is that the scope of built-in functions is global, and the scope of
UDF is internal to DB. When the link session is inside the data, directly using
the UDF name will find the corresponding UDF inside the current DB. Otherwise,
the user needs to display the specified UDF database name, such as
`dbName`.`funcName`.
+The usage of UDF is similar to regular functions, with the only difference
being that the scope of built-in functions is global, while the scope of UDF is
within the database. When the session is connected to a database, simply use
the UDF name to search for the corresponding UDF within the current database.
Otherwise, the user needs to explicitly specify the database name of the UDF,
such as `dbName`.`funcName`.
-## Delete UDF
+## Deleting UDF
-When you no longer need UDF functions, you can delete a UDF function by the
following command, you can refer to `DROP FUNCTION`.
+When you no longer need a UDF function, you can delete it using the `DROP
FUNCTION` command.
## Example
-Examples of rpc server implementations and cpp/java/python languages are
provided in the `samples/doris-demo/` directory. See the `README.md` in each
directory for details on how to use it.
-For example, rpc_add_string
-```
+
+The `samples/doris-demo/` directory provides examples of RPC server
implementations in CPP, Java, and Python languages. Please refer to the
`README.md` file in each directory for specific usage instructions.
+For example, `rpc_add_string`:
+
+```sql
mysql >select rpc_add_string('doris');
+-------------------------+
| rpc_add_string('doris') |
@@ -126,9 +158,9 @@ mysql >select rpc_add_string('doris');
| doris_rpc_test |
+-------------------------+
```
-The logs will be displayed.
+The log will display:
-```
+```json
INFO: fnCall request=function_name: "add_string"
args {
type {
@@ -147,4 +179,4 @@ INFO: fnCall res=result {
status {
status_code: 0
}
-```
\ No newline at end of file
+```
diff --git a/versioned_sidebars/version-2.0-sidebars.json
b/versioned_sidebars/version-2.0-sidebars.json
index b44d51d1de9c..f00c17dbe793 100644
--- a/versioned_sidebars/version-2.0-sidebars.json
+++ b/versioned_sidebars/version-2.0-sidebars.json
@@ -61,6 +61,7 @@
"type": "category",
"label": "Table Design",
"items": [
+ "table-design/data-type",
{
"type": "category",
"label": "Data Model",
@@ -72,7 +73,6 @@
"table-design/data-model/tips"
]
},
- "table-design/data-type",
"table-design/data-partition",
"table-design/schema-change",
"table-design/cold-hot-separation",
@@ -202,7 +202,7 @@
},
{
"type": "category",
- "label": "Create View and Materialize View",
+ "label": "View and Materialize View",
"items": [
"query/view-materialized-view/create-view",
"query/view-materialized-view/materialized-view",
@@ -235,8 +235,7 @@
"items": [
"query/query-analysis/query-profile",
"query/query-analysis/query-analysis",
- "query/query-analysis/import-analysis",
- "query/query-analysis/get-profile"
+ "query/query-analysis/import-analysis"
]
},
{
@@ -319,7 +318,8 @@
"label": "Query Admin",
"items": [
"admin-manual/query-admin/sql-interception",
- "admin-manual/query-admin/kill-query"
+ "admin-manual/query-admin/kill-query",
+ "admin-manual/query-admin/job"
]
},
{
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]