This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new e9b689b0f98 [mod](catalog) update doc for 2.1.7 and 3.0.3 (#1262)
e9b689b0f98 is described below
commit e9b689b0f987f59e64bc671961d0a1f8def153a1
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Fri Nov 1 18:23:46 2024 +0800
[mod](catalog) update doc for 2.1.7 and 3.0.3 (#1262)
# Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
# Languages
- [x] Chinese
- [x] English
---
docs/lakehouse/database/gbase-8a.md | 2 +-
docs/lakehouse/datalake-analytics/hive.md | 16 ++++++++
docs/lakehouse/datalake-analytics/paimon.md | 24 +++++++++++-
docs/lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
.../current/lakehouse/database/gbase-8a.md | 4 +-
.../current/lakehouse/datalake-analytics/hive.md | 16 ++++++++
.../current/lakehouse/datalake-analytics/paimon.md | 20 ++++++++++
.../lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
.../lakehouse/datalake-analytics/hive.md | 16 ++++++++
.../lakehouse/datalake-analytics/paimon.md | 20 ++++++++++
.../lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
.../lakehouse/datalake-analytics/hive.md | 16 ++++++++
.../lakehouse/datalake-analytics/paimon.md | 20 ++++++++++
.../lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
.../lakehouse/datalake-analytics/hive.md | 16 ++++++++
.../lakehouse/datalake-analytics/paimon.md | 24 +++++++++++-
.../lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
.../lakehouse/datalake-analytics/hive.md | 16 ++++++++
.../lakehouse/datalake-analytics/paimon.md | 25 +++++++++++-
.../lakehouse/datalake-building/hive-build.md | 44 +++++++++++++++++++---
20 files changed, 455 insertions(+), 44 deletions(-)
diff --git a/docs/lakehouse/database/gbase-8a.md
b/docs/lakehouse/database/gbase-8a.md
index d018bf292fe..66641b5f69c 100644
--- a/docs/lakehouse/database/gbase-8a.md
+++ b/docs/lakehouse/database/gbase-8a.md
@@ -40,7 +40,7 @@ To connect to a GBase 8a database you need
- Doris Network connection between each FE and BE node and the GBase 8a
server, default port is 5258.
-::: warning Notice
+:::warning Notice
When this Catalog is tested against GBase 8a, the versions used are as follows:
- GBase 8a: GBase8a_MPP_Cluster-NoLicense-FREE-9.5.3.28.12-redhat7-x86_64
- JDBC driver: gbase-connector-java-9.5.0.7-build1-bin.jar
diff --git a/docs/lakehouse/datalake-analytics/hive.md
b/docs/lakehouse/datalake-analytics/hive.md
index c00767150fe..2b037a57c40 100644
--- a/docs/lakehouse/datalake-analytics/hive.md
+++ b/docs/lakehouse/datalake-analytics/hive.md
@@ -326,6 +326,22 @@ For Hive/Iceberge/Hudi
> The variable default is false.
+## Query Hive partitions
+
+You can query Hive partition information in the following two ways.
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ This statement can list all partitions and partition value information of
the specified Hive table.
+
+- Use `table$partitions` metadata table
+
+ Since versions 2.1.7 and 3.0.3, users can query Hive partition information
through the `table$partitions` metadata table. `table$partitions` is
essentially a relational table, so it can be used in any SELECT statement.
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## Access HMS with broker
Add following setting when creating an HMS catalog, file splitting and
scanning for Hive external table will be completed by broker named `test_broker`
diff --git a/docs/lakehouse/datalake-analytics/paimon.md
b/docs/lakehouse/datalake-analytics/paimon.md
index 7c4869d0e61..c40fed83946 100644
--- a/docs/lakehouse/datalake-analytics/paimon.md
+++ b/docs/lakehouse/datalake-analytics/paimon.md
@@ -38,7 +38,7 @@ Paimon Catalog Currently supports two types of Metastore
creation catalogs:
* filesystem(default),Store both metadata and data in the file system.
* hive metastore,It also stores metadata in Hive metastore. Users can access
these tables directly from Hive.
-### Creating a Catalog Based on FileSystem
+### Creating a Catalog based on FileSystem
#### HDFS
@@ -116,7 +116,7 @@ CREATE CATALOG `paimon_oss` PROPERTIES (
);
```
-### Creating a Catalog Based on Hive Metastore
+### Creating a Catalog based on Hive Metastore
```sql
CREATE CATALOG `paimon_hms` PROPERTIES (
@@ -150,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### Create a Catalog based on Aliyun DLF
+
+This feature is supported since version 2.1.7 and 3.0.3.
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## Column Type Mapping
| Paimon Data Type | Doris Data Type | Comment
|
diff --git a/docs/lakehouse/datalake-building/hive-build.md
b/docs/lakehouse/datalake-building/hive-build.md
index b3c83a39d91..a63e550e826 100644
--- a/docs/lakehouse/datalake-building/hive-build.md
+++ b/docs/lakehouse/datalake-building/hive-build.md
@@ -146,6 +146,21 @@ For clusters upgraded from old versions, these variables
may change.
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
After creation, you can view the Hive table creation statement using the
`SHOW CREATE TABLE` command.
@@ -209,13 +224,24 @@ For clusters upgraded from old versions, these variables
may change.
- File Formats
- - Parquet
- ORC (default)
-
+ - Parquet
+ - Text (supported since version 2.1.7 & 3.0.3)
+
+ The Text format also supports the following table properties:
+
+ - `field.delim`: column delimiter. Default `\1`.
+ - `line.delim`: row delimiter. Default `\n`.
+ - `collection.delim`: delimiter between elements in complex types.
Default `\2`.
+ - `mapkey.delim`: key value delimiter of Map type. Default `\3`
+ - `serialization.null.format`: storage format of NULL values. Default
`\N`.
+ - `escape.delim`: escape character. Default `\`.
+
- Compression Formats
- - Parquet: snappy(default), zstd, plain. (plain means no compression is
used.)
- - ORC: snappy, zlib(default), zstd, plain. (plain means no compression is
used.)
+ - Parquet: snappy(default), zstd, plain. (plain means no compression)
+ - ORC: snappy, zlib(default), zstd, plain. (plain means no compression)
+ - Text: gzip, defalte, bzip2, zstd, lz4, lzo, snappy, plain (default).
(plain means no compression)
- Storage Medium
@@ -234,7 +260,7 @@ Currently, writing to specific partitions is not supported.
### INSERT
-The INSERT operation appends data to the target table.
+The INSERT operation appends data to the target table. Currently, writing to a
specific partition is not supported.
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -246,13 +272,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data.
+The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data. Currently, writing to a specific partition is not
supported.
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+The semantics of INSERT OVERWRITE is consistent with Hive, and has the
following behaviors:
+
+- When the target table is a partitioned table and the source table is empty,
the operation will not have any effect. The target table data will not change.
+- When the target table is a non-partitioned table and the source table is
empty, the target table will be cleared.
+- Currently, writing to a specified partition is not supported, so INSERT
OVERWRITE automatically processes the corresponding target table partition
according to the value in the source table. If the target table is a
partitioned table, only the partitions involved will be overwritten, and the
data of the partitions not involved will not change.
+
### CTAS (CREATE TABLE AS SELECT)
A Hive table can be created and populated with data using the `CTAS (CREATE
TABLE AS SELECT)` statement:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/database/gbase-8a.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/database/gbase-8a.md
index 40c486d6fb5..14c4a343dcb 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/database/gbase-8a.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/database/gbase-8a.md
@@ -40,10 +40,12 @@ Doris JDBC Catalog 支持通过标准 JDBC 接口连接 GBase 8a 数据库。本
- Doris 每个 FE 和 BE 节点和 GBase 8a 服务器之间的网络连接,默认端口为 5258。
-::: warning 注意
+:::warning 注意
此 Catalog 对接测试 GBase 8a 时,使用的版本如下:
+
- GBase 8a: GBase8a_MPP_Cluster-NoLicense-FREE-9.5.3.28.12-redhat7-x86_64
- JDBC 驱动程序: gbase-connector-java-9.5.0.7-build1-bin.jar
+
其他版本未经测试,可能会有兼容性问题。
:::
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/hive.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/hive.md
index 25ff1592978..647a1b17265 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/hive.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/hive.md
@@ -306,6 +306,22 @@ CREATE CATALOG hive PROPERTIES (
> 该变量默认为 `false`。
+## 查询 Hive 分区
+
+可以通过下面两种方式查询 Hive 分区信息。
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ 该语句可以列出指定 Hive 表的所有分区以及分区值信息。
+
+- 使用 `table$partitions` 元数据表
+
+ 自 2.1.7 和 3.0.3 版本开始,用户可以通过 `table$partitions` 元数据表查询 Hive
分区信息。`table$partitions` 本质上是一个关系表,所以可以使用在任意 SELECT 语句中。
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## 使用 broker 访问 HMS
创建 HMS Catalog 时增加如下配置,Hive 外表文件分片和文件扫描将会由名为 `test_broker` 的 Broker 完成
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/paimon.md
index e66cc50b317..ecc2484ad6e 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/paimon.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-analytics/paimon.md
@@ -150,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### 基于 Aliyun DLF 创建 Catalog
+
+该功能自 2.1.7 和 3.0.3 版本支持。
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## 列类型映射
| Paimon Data Type | Doris Data Type | Comment
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-building/hive-build.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-building/hive-build.md
index 2b76690cd82..f16186a0c1c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-building/hive-build.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/datalake-building/hive-build.md
@@ -147,8 +147,23 @@ set global enable_fallback_to_original_planner = false;
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
-
+
创建后,可以通过 `SHOW CREATE TABLE` 命令查看 Hive 的建表语句。
注意,不同于 Hive 中的建表语句。在 Apache Doris 中创建 Hive 分区表时,分区列也必须写到 Table 的 Schema
中。同时,分区列必须在所有 Schema 的最后,且顺序保持一致。
@@ -208,13 +223,24 @@ set global enable_fallback_to_original_planner = false;
- 文件格式
- - Parquet
- ORC(默认)
+ - Parquet
+ - Text (自 2.1.7 和 3.0.3 版本开始支持)
+
+ Text 格式还支持以下表属性:
+
+ - `field.delim`:列分隔符。默认 `\1`。
+ - `line.delim`:行分隔符。默认 `\n`。
+ - `collection.delim`:复杂类型中各元素之间的分隔符。默认 `\2`。
+ - `mapkey.delim`:Map 类型的键值分割符。默认 `\3`
+ - `serialization.null.format`:NULL 值的存储格式。默认 `\N`。
+ - `escape.delim`:转移字符。默认 `\`。
- 压缩格式
- - Parquet:snappy(默认),zstd,plain。(Plain 就是不采用压缩)
- - ORC:snappy,zlib(默认),zstd,plain。(Plain 就是不采用压缩)
+ - Parquet:snappy(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - ORC:snappy、zlib(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - Text:gzipm、defalte、bzip2、zstd、lz4、lzo、snappy、plain(默认)。(Plain 就是不采用压缩)
- 存储介质
@@ -233,7 +259,7 @@ set global enable_fallback_to_original_planner = false;
### INSERT
-INSERT 操作会数据以追加的方式写入到目标表中。
+INSERT 操作会数据以追加的方式写入到目标表中。当前不支持指定分区写入。
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -245,13 +271,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。
+INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。当前不支持指定分区写入。
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+INSERT OVERWRITE 的语义与 Hive 一致,有如下行为:
+
+- 当目的表是分区表,而源表为空表时,操作不会产生任何影响。目的表数据无变化。
+- 当目的表是非分区表,而源表是空表是,目的表会被清空。
+- 当前不支持指定分区写入,因此 INSERT OVERWRITE
为根据源表中的数值,自动处理对应的目的表分区。如果目的表是分区表,则只会覆盖涉及到的分区,不涉及的分区,数据无变化。
+
### CTAS(CREATE TABLE AS SELECT)
可以通过 `CTAS(CREATE TABLE AS SELECT)` 语句创建 Hive 表并写入数据:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/hive.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/hive.md
index 25ff1592978..647a1b17265 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/hive.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/hive.md
@@ -306,6 +306,22 @@ CREATE CATALOG hive PROPERTIES (
> 该变量默认为 `false`。
+## 查询 Hive 分区
+
+可以通过下面两种方式查询 Hive 分区信息。
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ 该语句可以列出指定 Hive 表的所有分区以及分区值信息。
+
+- 使用 `table$partitions` 元数据表
+
+ 自 2.1.7 和 3.0.3 版本开始,用户可以通过 `table$partitions` 元数据表查询 Hive
分区信息。`table$partitions` 本质上是一个关系表,所以可以使用在任意 SELECT 语句中。
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## 使用 broker 访问 HMS
创建 HMS Catalog 时增加如下配置,Hive 外表文件分片和文件扫描将会由名为 `test_broker` 的 Broker 完成
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/paimon.md
index e66cc50b317..ecc2484ad6e 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/paimon.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-analytics/paimon.md
@@ -150,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### 基于 Aliyun DLF 创建 Catalog
+
+该功能自 2.1.7 和 3.0.3 版本支持。
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## 列类型映射
| Paimon Data Type | Doris Data Type | Comment
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-building/hive-build.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-building/hive-build.md
index 2b76690cd82..f16186a0c1c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-building/hive-build.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/datalake-building/hive-build.md
@@ -147,8 +147,23 @@ set global enable_fallback_to_original_planner = false;
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
-
+
创建后,可以通过 `SHOW CREATE TABLE` 命令查看 Hive 的建表语句。
注意,不同于 Hive 中的建表语句。在 Apache Doris 中创建 Hive 分区表时,分区列也必须写到 Table 的 Schema
中。同时,分区列必须在所有 Schema 的最后,且顺序保持一致。
@@ -208,13 +223,24 @@ set global enable_fallback_to_original_planner = false;
- 文件格式
- - Parquet
- ORC(默认)
+ - Parquet
+ - Text (自 2.1.7 和 3.0.3 版本开始支持)
+
+ Text 格式还支持以下表属性:
+
+ - `field.delim`:列分隔符。默认 `\1`。
+ - `line.delim`:行分隔符。默认 `\n`。
+ - `collection.delim`:复杂类型中各元素之间的分隔符。默认 `\2`。
+ - `mapkey.delim`:Map 类型的键值分割符。默认 `\3`
+ - `serialization.null.format`:NULL 值的存储格式。默认 `\N`。
+ - `escape.delim`:转移字符。默认 `\`。
- 压缩格式
- - Parquet:snappy(默认),zstd,plain。(Plain 就是不采用压缩)
- - ORC:snappy,zlib(默认),zstd,plain。(Plain 就是不采用压缩)
+ - Parquet:snappy(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - ORC:snappy、zlib(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - Text:gzipm、defalte、bzip2、zstd、lz4、lzo、snappy、plain(默认)。(Plain 就是不采用压缩)
- 存储介质
@@ -233,7 +259,7 @@ set global enable_fallback_to_original_planner = false;
### INSERT
-INSERT 操作会数据以追加的方式写入到目标表中。
+INSERT 操作会数据以追加的方式写入到目标表中。当前不支持指定分区写入。
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -245,13 +271,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。
+INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。当前不支持指定分区写入。
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+INSERT OVERWRITE 的语义与 Hive 一致,有如下行为:
+
+- 当目的表是分区表,而源表为空表时,操作不会产生任何影响。目的表数据无变化。
+- 当目的表是非分区表,而源表是空表是,目的表会被清空。
+- 当前不支持指定分区写入,因此 INSERT OVERWRITE
为根据源表中的数值,自动处理对应的目的表分区。如果目的表是分区表,则只会覆盖涉及到的分区,不涉及的分区,数据无变化。
+
### CTAS(CREATE TABLE AS SELECT)
可以通过 `CTAS(CREATE TABLE AS SELECT)` 语句创建 Hive 表并写入数据:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/hive.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/hive.md
index 25ff1592978..647a1b17265 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/hive.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/hive.md
@@ -306,6 +306,22 @@ CREATE CATALOG hive PROPERTIES (
> 该变量默认为 `false`。
+## 查询 Hive 分区
+
+可以通过下面两种方式查询 Hive 分区信息。
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ 该语句可以列出指定 Hive 表的所有分区以及分区值信息。
+
+- 使用 `table$partitions` 元数据表
+
+ 自 2.1.7 和 3.0.3 版本开始,用户可以通过 `table$partitions` 元数据表查询 Hive
分区信息。`table$partitions` 本质上是一个关系表,所以可以使用在任意 SELECT 语句中。
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## 使用 broker 访问 HMS
创建 HMS Catalog 时增加如下配置,Hive 外表文件分片和文件扫描将会由名为 `test_broker` 的 Broker 完成
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/paimon.md
index e66cc50b317..ecc2484ad6e 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/paimon.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-analytics/paimon.md
@@ -150,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### 基于 Aliyun DLF 创建 Catalog
+
+该功能自 2.1.7 和 3.0.3 版本支持。
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## 列类型映射
| Paimon Data Type | Doris Data Type | Comment
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-building/hive-build.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-building/hive-build.md
index 2b76690cd82..f16186a0c1c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-building/hive-build.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/lakehouse/datalake-building/hive-build.md
@@ -147,8 +147,23 @@ set global enable_fallback_to_original_planner = false;
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
-
+
创建后,可以通过 `SHOW CREATE TABLE` 命令查看 Hive 的建表语句。
注意,不同于 Hive 中的建表语句。在 Apache Doris 中创建 Hive 分区表时,分区列也必须写到 Table 的 Schema
中。同时,分区列必须在所有 Schema 的最后,且顺序保持一致。
@@ -208,13 +223,24 @@ set global enable_fallback_to_original_planner = false;
- 文件格式
- - Parquet
- ORC(默认)
+ - Parquet
+ - Text (自 2.1.7 和 3.0.3 版本开始支持)
+
+ Text 格式还支持以下表属性:
+
+ - `field.delim`:列分隔符。默认 `\1`。
+ - `line.delim`:行分隔符。默认 `\n`。
+ - `collection.delim`:复杂类型中各元素之间的分隔符。默认 `\2`。
+ - `mapkey.delim`:Map 类型的键值分割符。默认 `\3`
+ - `serialization.null.format`:NULL 值的存储格式。默认 `\N`。
+ - `escape.delim`:转移字符。默认 `\`。
- 压缩格式
- - Parquet:snappy(默认),zstd,plain。(Plain 就是不采用压缩)
- - ORC:snappy,zlib(默认),zstd,plain。(Plain 就是不采用压缩)
+ - Parquet:snappy(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - ORC:snappy、zlib(默认)、zstd、plain。(Plain 就是不采用压缩)
+ - Text:gzipm、defalte、bzip2、zstd、lz4、lzo、snappy、plain(默认)。(Plain 就是不采用压缩)
- 存储介质
@@ -233,7 +259,7 @@ set global enable_fallback_to_original_planner = false;
### INSERT
-INSERT 操作会数据以追加的方式写入到目标表中。
+INSERT 操作会数据以追加的方式写入到目标表中。当前不支持指定分区写入。
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -245,13 +271,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。
+INSERT OVERWRITE 会使用新的数据完全覆盖原有表中的数据。当前不支持指定分区写入。
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+INSERT OVERWRITE 的语义与 Hive 一致,有如下行为:
+
+- 当目的表是分区表,而源表为空表时,操作不会产生任何影响。目的表数据无变化。
+- 当目的表是非分区表,而源表是空表是,目的表会被清空。
+- 当前不支持指定分区写入,因此 INSERT OVERWRITE
为根据源表中的数值,自动处理对应的目的表分区。如果目的表是分区表,则只会覆盖涉及到的分区,不涉及的分区,数据无变化。
+
### CTAS(CREATE TABLE AS SELECT)
可以通过 `CTAS(CREATE TABLE AS SELECT)` 语句创建 Hive 表并写入数据:
diff --git a/versioned_docs/version-2.1/lakehouse/datalake-analytics/hive.md
b/versioned_docs/version-2.1/lakehouse/datalake-analytics/hive.md
index c00767150fe..2b037a57c40 100644
--- a/versioned_docs/version-2.1/lakehouse/datalake-analytics/hive.md
+++ b/versioned_docs/version-2.1/lakehouse/datalake-analytics/hive.md
@@ -326,6 +326,22 @@ For Hive/Iceberge/Hudi
> The variable default is false.
+## Query Hive partitions
+
+You can query Hive partition information in the following two ways.
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ This statement can list all partitions and partition value information of
the specified Hive table.
+
+- Use `table$partitions` metadata table
+
+ Since versions 2.1.7 and 3.0.3, users can query Hive partition information
through the `table$partitions` metadata table. `table$partitions` is
essentially a relational table, so it can be used in any SELECT statement.
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## Access HMS with broker
Add following setting when creating an HMS catalog, file splitting and
scanning for Hive external table will be completed by broker named `test_broker`
diff --git a/versioned_docs/version-2.1/lakehouse/datalake-analytics/paimon.md
b/versioned_docs/version-2.1/lakehouse/datalake-analytics/paimon.md
index 7c4869d0e61..c40fed83946 100644
--- a/versioned_docs/version-2.1/lakehouse/datalake-analytics/paimon.md
+++ b/versioned_docs/version-2.1/lakehouse/datalake-analytics/paimon.md
@@ -38,7 +38,7 @@ Paimon Catalog Currently supports two types of Metastore
creation catalogs:
* filesystem(default),Store both metadata and data in the file system.
* hive metastore,It also stores metadata in Hive metastore. Users can access
these tables directly from Hive.
-### Creating a Catalog Based on FileSystem
+### Creating a Catalog based on FileSystem
#### HDFS
@@ -116,7 +116,7 @@ CREATE CATALOG `paimon_oss` PROPERTIES (
);
```
-### Creating a Catalog Based on Hive Metastore
+### Creating a Catalog based on Hive Metastore
```sql
CREATE CATALOG `paimon_hms` PROPERTIES (
@@ -150,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### Create a Catalog based on Aliyun DLF
+
+This feature is supported since version 2.1.7 and 3.0.3.
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## Column Type Mapping
| Paimon Data Type | Doris Data Type | Comment
|
diff --git
a/versioned_docs/version-2.1/lakehouse/datalake-building/hive-build.md
b/versioned_docs/version-2.1/lakehouse/datalake-building/hive-build.md
index b3c83a39d91..a63e550e826 100644
--- a/versioned_docs/version-2.1/lakehouse/datalake-building/hive-build.md
+++ b/versioned_docs/version-2.1/lakehouse/datalake-building/hive-build.md
@@ -146,6 +146,21 @@ For clusters upgraded from old versions, these variables
may change.
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
After creation, you can view the Hive table creation statement using the
`SHOW CREATE TABLE` command.
@@ -209,13 +224,24 @@ For clusters upgraded from old versions, these variables
may change.
- File Formats
- - Parquet
- ORC (default)
-
+ - Parquet
+ - Text (supported since version 2.1.7 & 3.0.3)
+
+ The Text format also supports the following table properties:
+
+ - `field.delim`: column delimiter. Default `\1`.
+ - `line.delim`: row delimiter. Default `\n`.
+ - `collection.delim`: delimiter between elements in complex types.
Default `\2`.
+ - `mapkey.delim`: key value delimiter of Map type. Default `\3`
+ - `serialization.null.format`: storage format of NULL values. Default
`\N`.
+ - `escape.delim`: escape character. Default `\`.
+
- Compression Formats
- - Parquet: snappy(default), zstd, plain. (plain means no compression is
used.)
- - ORC: snappy, zlib(default), zstd, plain. (plain means no compression is
used.)
+ - Parquet: snappy(default), zstd, plain. (plain means no compression)
+ - ORC: snappy, zlib(default), zstd, plain. (plain means no compression)
+ - Text: gzip, defalte, bzip2, zstd, lz4, lzo, snappy, plain (default).
(plain means no compression)
- Storage Medium
@@ -234,7 +260,7 @@ Currently, writing to specific partitions is not supported.
### INSERT
-The INSERT operation appends data to the target table.
+The INSERT operation appends data to the target table. Currently, writing to a
specific partition is not supported.
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -246,13 +272,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data.
+The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data. Currently, writing to a specific partition is not
supported.
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+The semantics of INSERT OVERWRITE is consistent with Hive, and has the
following behaviors:
+
+- When the target table is a partitioned table and the source table is empty,
the operation will not have any effect. The target table data will not change.
+- When the target table is a non-partitioned table and the source table is
empty, the target table will be cleared.
+- Currently, writing to a specified partition is not supported, so INSERT
OVERWRITE automatically processes the corresponding target table partition
according to the value in the source table. If the target table is a
partitioned table, only the partitions involved will be overwritten, and the
data of the partitions not involved will not change.
+
### CTAS (CREATE TABLE AS SELECT)
A Hive table can be created and populated with data using the `CTAS (CREATE
TABLE AS SELECT)` statement:
diff --git a/versioned_docs/version-3.0/lakehouse/datalake-analytics/hive.md
b/versioned_docs/version-3.0/lakehouse/datalake-analytics/hive.md
index c00767150fe..2b037a57c40 100644
--- a/versioned_docs/version-3.0/lakehouse/datalake-analytics/hive.md
+++ b/versioned_docs/version-3.0/lakehouse/datalake-analytics/hive.md
@@ -326,6 +326,22 @@ For Hive/Iceberge/Hudi
> The variable default is false.
+## Query Hive partitions
+
+You can query Hive partition information in the following two ways.
+
+- `SHOW PARTITIONS FROM hive_table`
+
+ This statement can list all partitions and partition value information of
the specified Hive table.
+
+- Use `table$partitions` metadata table
+
+ Since versions 2.1.7 and 3.0.3, users can query Hive partition information
through the `table$partitions` metadata table. `table$partitions` is
essentially a relational table, so it can be used in any SELECT statement.
+
+ ```
+ SELECT * FROM hive_table$partitions;
+ ```
+
## Access HMS with broker
Add following setting when creating an HMS catalog, file splitting and
scanning for Hive external table will be completed by broker named `test_broker`
diff --git a/versioned_docs/version-3.0/lakehouse/datalake-analytics/paimon.md
b/versioned_docs/version-3.0/lakehouse/datalake-analytics/paimon.md
index 0b809470a73..c40fed83946 100644
--- a/versioned_docs/version-3.0/lakehouse/datalake-analytics/paimon.md
+++ b/versioned_docs/version-3.0/lakehouse/datalake-analytics/paimon.md
@@ -24,6 +24,7 @@ specific language governing permissions and limitations
under the License.
-->
+[Apache Doris & Paimon Quick
Start](https://doris.apache.org/docs/gettingStarted/tutorials/doris-paimon)
## Instructions for use
@@ -37,7 +38,7 @@ Paimon Catalog Currently supports two types of Metastore
creation catalogs:
* filesystem(default),Store both metadata and data in the file system.
* hive metastore,It also stores metadata in Hive metastore. Users can access
these tables directly from Hive.
-### Creating a Catalog Based on FileSystem
+### Creating a Catalog based on FileSystem
#### HDFS
@@ -115,7 +116,7 @@ CREATE CATALOG `paimon_oss` PROPERTIES (
);
```
-### Creating a Catalog Based on Hive Metastore
+### Creating a Catalog based on Hive Metastore
```sql
CREATE CATALOG `paimon_hms` PROPERTIES (
@@ -149,6 +150,26 @@ CREATE CATALOG `paimon_kerberos` PROPERTIES (
);
```
+### Create a Catalog based on Aliyun DLF
+
+This feature is supported since version 2.1.7 and 3.0.3.
+
+```
+CREATE CATALOG `paimon_dlf` PROPERTIES (
+ "type" = "paimon",
+ "paimon.catalog.type" = "dlf",
+ "warehouse" = "oss://xx/yy/",
+ "dlf.proxy.mode" = "DLF_ONLY",
+ "dlf.uid" = "xxxxx",
+ "dlf.region" = "cn-beijing",
+ "dlf.access_key" = "ak",
+ "dlf.secret_key" = "sk"
+
+ -- "dlf.endpoint" = "dlf.cn-beijing.aliyuncs.com", -- optional
+ -- "dlf.catalog.id" = "xxxx", -- optional
+);
+```
+
## Column Type Mapping
| Paimon Data Type | Doris Data Type | Comment
|
diff --git
a/versioned_docs/version-3.0/lakehouse/datalake-building/hive-build.md
b/versioned_docs/version-3.0/lakehouse/datalake-building/hive-build.md
index b3c83a39d91..a63e550e826 100644
--- a/versioned_docs/version-3.0/lakehouse/datalake-building/hive-build.md
+++ b/versioned_docs/version-3.0/lakehouse/datalake-building/hive-build.md
@@ -146,6 +146,21 @@ For clusters upgraded from old versions, these variables
may change.
'file_format'='orc',
'compression'='zlib'
);
+
+ -- Create text format table(Since 2.1.7 & 3.0.3)
+ CREATE TABLE text_table (
+ `id` INT,
+ `name` STRING
+ ) PROPERTIES (
+ 'file_format'='text',
+ 'compression'='gzip',
+ 'field.delim'='\t',
+ 'line.delim'='\n',
+ 'collection.delim'=';',
+ 'mapkey.delim'=':',
+ 'serialization.null.format'='\\N',
+ 'escape.delim'='\\'
+ );
```
After creation, you can view the Hive table creation statement using the
`SHOW CREATE TABLE` command.
@@ -209,13 +224,24 @@ For clusters upgraded from old versions, these variables
may change.
- File Formats
- - Parquet
- ORC (default)
-
+ - Parquet
+ - Text (supported since version 2.1.7 & 3.0.3)
+
+ The Text format also supports the following table properties:
+
+ - `field.delim`: column delimiter. Default `\1`.
+ - `line.delim`: row delimiter. Default `\n`.
+ - `collection.delim`: delimiter between elements in complex types.
Default `\2`.
+ - `mapkey.delim`: key value delimiter of Map type. Default `\3`
+ - `serialization.null.format`: storage format of NULL values. Default
`\N`.
+ - `escape.delim`: escape character. Default `\`.
+
- Compression Formats
- - Parquet: snappy(default), zstd, plain. (plain means no compression is
used.)
- - ORC: snappy, zlib(default), zstd, plain. (plain means no compression is
used.)
+ - Parquet: snappy(default), zstd, plain. (plain means no compression)
+ - ORC: snappy, zlib(default), zstd, plain. (plain means no compression)
+ - Text: gzip, defalte, bzip2, zstd, lz4, lzo, snappy, plain (default).
(plain means no compression)
- Storage Medium
@@ -234,7 +260,7 @@ Currently, writing to specific partitions is not supported.
### INSERT
-The INSERT operation appends data to the target table.
+The INSERT operation appends data to the target table. Currently, writing to a
specific partition is not supported.
```
INSERT INTO hive_tbl values (val1, val2, val3, val4);
@@ -246,13 +272,19 @@ INSERT INTO hive_tbl(col1, col2, partition_col1,
partition_col2) values (1, 2, "
### INSERT OVERWRITE
-The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data.
+The INSERT OVERWRITE operation completely overwrites the existing data in the
table with new data. Currently, writing to a specific partition is not
supported.
```
INSERT OVERWRITE TABLE VALUES(val1, val2, val3, val4)
INSERT OVERWRITE TABLE hive.hive_db.hive_tbl(col1, col2) SELECT col1, col2
FROM internal.db1.tbl1;
```
+The semantics of INSERT OVERWRITE is consistent with Hive, and has the
following behaviors:
+
+- When the target table is a partitioned table and the source table is empty,
the operation will not have any effect. The target table data will not change.
+- When the target table is a non-partitioned table and the source table is
empty, the target table will be cleared.
+- Currently, writing to a specified partition is not supported, so INSERT
OVERWRITE automatically processes the corresponding target table partition
according to the value in the source table. If the target table is a
partitioned table, only the partitions involved will be overwritten, and the
data of the partitions not involved will not change.
+
### CTAS (CREATE TABLE AS SELECT)
A Hive table can be created and populated with data using the `CTAS (CREATE
TABLE AS SELECT)` statement:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]