This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 10efbdf2f3b [opt] opt catalog overview (#3426)
10efbdf2f3b is described below
commit 10efbdf2f3b92934b130ba184d4b86500fc3d0e8
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Mon Mar 2 20:20:59 2026 +0800
[opt] opt catalog overview (#3426)
## Versions
- [x] dev
- [x] 4.x
- [x] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/lakehouse/catalog-overview.md | 214 +++++++++++++++------
.../current/lakehouse/catalog-overview.md | 118 ++++++++++--
.../version-3.x/lakehouse/catalog-overview.md | 118 ++++++++++--
.../version-4.x/lakehouse/catalog-overview.md | 120 ++++++++++--
.../version-3.x/lakehouse/catalog-overview.md | 214 +++++++++++++++------
.../version-4.x/lakehouse/catalog-overview.md | 214 +++++++++++++++------
6 files changed, 763 insertions(+), 235 deletions(-)
diff --git a/docs/lakehouse/catalog-overview.md
b/docs/lakehouse/catalog-overview.md
index f70c850fa74..2051135a0f2 100644
--- a/docs/lakehouse/catalog-overview.md
+++ b/docs/lakehouse/catalog-overview.md
@@ -2,32 +2,32 @@
{
"title": "Data Catalog Overview",
"language": "en",
- "description": "A Data Catalog is used to describe the attributes of a
data source."
+ "description": "Learn about Apache Doris Data Catalog concepts and usage.
Create external data catalogs to connect to Hive, Iceberg, Paimon, PostgreSQL,
and other data sources for federated cross-source queries, data ingestion, and
write-back."
}
---
-A Data Catalog is used to describe the attributes of a data source.
+A Data Catalog describes the properties of a data source.
-In Doris, multiple catalogs can be created to point to different data sources
(such as Hive, Iceberg, MySQL). Doris will automatically obtain the databases,
tables, columns, partitions, data locations, etc. of the corresponding data
sources through the catalogs. Users can access these catalogs for data analysis
through standard SQL statements and can conduct join queries on the data from
multiple catalogs.
+In Doris, you can create multiple data catalogs pointing to different data
sources (such as Hive, Iceberg, Paimon, PostgreSQL). Doris automatically
retrieves databases, tables, schemas, partitions, and data locations from the
corresponding data source through data catalogs. Users can access these data
catalogs for data analysis using standard SQL statements, and perform join
queries across data from multiple data catalogs.
-There are two types of catalogs in Doris:
+There are two types of data catalogs in Doris:
-| Type | Description |
+| Type | Description |
| ---------------- | -------------------------------------------------------- |
-| Internal Catalog | The built-in catalog, named `internal`, used to store
Doris internal table data. It cannot be created, modified, or deleted. |
-| External Catalog | External catalogs refer to all catalogs other than the
Internal Catalog. Users can create, modify, and delete external catalogs. |
+| Internal Catalog | The built-in data catalog with a fixed name of
`internal`, used to store Doris internal table data. It cannot be created,
modified, or dropped. |
+| External Catalog | External data catalogs, referring to all data catalogs
other than the Internal Catalog. Users can create, modify, and drop external
data catalogs. |
-Catalogs are mainly applicable to the following three scenarios, but different
catalogs are suitable for different scenarios. For details, see the
documentation for the corresponding catalog.
+Data catalogs are primarily applicable to the following three types of
scenarios, but different data catalogs have different applicable scenarios.
Refer to the corresponding data catalog documentation for details.
-| Scenario | Description |
+| Scenario | Description |
| ---- | ------------------------------------------- |
-| Query Acceleration | Direct query acceleration for data lakes such as Hive,
Iceberg, Paimon, etc. |
-| Data Integration | ZeroETL solution, directly accessing different data
sources to generate result data, or facilitating data flow between different
data sources. |
-| Data Write-back | After data processing via Doris, write back to external
data sources. |
+| Query Acceleration | Directly accelerate queries on lakehouse data such as
Hive, Iceberg, Paimon, etc. |
+| Data Integration | ZeroETL approach to directly access different data
sources to generate result data, or enable convenient data flow between
different data sources. |
+| Data Write-Back | Process and transform data through Doris, then write it
back to external data sources. |
-This document uses [Iceberg Catalog](./catalogs/iceberg-catalog.mdx) as an
example to focus on the basic operations of catalogs. For detailed descriptions
of different catalogs, please refer to the documentation of the corresponding
catalog.
+This article uses [Iceberg Catalog](./catalogs/iceberg-catalog) as an example
to introduce the basic operations of data catalogs. For detailed introductions
to different data catalogs, please refer to the corresponding data catalog
documentation.
-## Creating Catalog
+## Creating a Data Catalog
Create an Iceberg Catalog using the `CREATE CATALOG` statement.
@@ -42,23 +42,109 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
);
```
-Essentially, a catalog created in Doris acts as a "proxy" to access the
metadata services (such as Hive Metastore) and storage services (such as
HDFS/S3) of the corresponding data source. Doris only stores connection
properties and other information of the catalog, not the actual metadata and
data of the corresponding data source.
+Essentially, a data catalog created in Doris acts as a "proxy" to access the
metadata service (such as Hive Metastore) and storage service (such as HDFS/S3)
of the corresponding data source. Doris only stores connection properties and
other information about the data catalog, not the actual metadata or data of
the corresponding data source.
### Common Properties
-In addition to the set of properties specific to each catalog, here are the
common properties for all catalogs `{CommonProperties}`.
+In addition to the property set specific to each data catalog, this section
introduces the common properties `{CommonProperties}` shared by all data
catalogs.
-| Property Name | Description
| Example |
-| ------------------------ |
------------------------------------------------------------------------------------------------------------------------------------
| -------------------------------------- |
-| `include_database_list` | Supports synchronizing only specified Databases,
separated by `,`. By default, all Databases are synchronized. Database names
are case-sensitive. Use this parameter when there are many Databases in the
external data source but only a few need to be accessed, to avoid synchronizing
a large amount of metadata. | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | Supports specifying multiple Databases that do
not need to be synchronized, separated by `,`. By default, no filtering is
applied, and all Databases are synchronized. Database names are case-sensitive.
This is used in the same scenario as above, to exclude databases that do not
need to be accessed. If there is a conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| Property | Description | Example |
+| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
+| `include_database_list` | Specifies multiple databases to synchronize,
separated by `,`. By default, all databases are synchronized. Database names
are case-sensitive. When the external data source has a large number of
databases but only a few need to be accessed, this parameter can be used to
avoid synchronizing a large amount of metadata. | `'include_database_list' =
'db1,db2'` |
+| `exclude_database_list` | Specifies multiple databases to exclude from
synchronization, separated by `,`. By default, no filtering is applied and all
databases are synchronized. Database names are case-sensitive. Applicable to
the same scenario as above, but for inversely excluding databases that do not
need to be accessed. In case of conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | Specifies multiple tables to synchronize, in `db.tbl`
format, separated by `,`. When set, listing tables under a database will only
return the specified tables, rather than fetching the full table list from the
remote metadata service. Applicable when the external data source has a large
number of tables and fetching the full table list may time out. |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog-level table name case control. See the
[Table Name Case
Sensitivity](#table-name-case-sensitivity-lower_case_table_names) section below
for values and their meanings. The default value inherits from the global
variable `lower_case_table_names`. | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog-level database name case control. See
the [Database Name Case
Sensitivity](#database-name-case-sensitivity-lower_case_database_names) section
below for values and their meanings. The default value is `0` (case-sensitive).
| `'lower_case_database_names' = '2'` |
+### Specifying Table List
+
+This feature is supported since version 4.1.0.
+
+When the external data source (such as Hive Metastore) contains a large number
of tables, fetching the full table list from the remote metadata service can be
very time-consuming or even time out. By setting the `include_table_list`
property, you can specify the tables to synchronize and avoid fetching the full
table list from the remote.
+
+`include_table_list` uses the `db.tbl` format, with multiple tables separated
by commas `,`.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+Behavior after setting:
+
+- When listing tables under `db1`, only `table1` and `table2` are returned,
without calling the remote metadata service's full table list API.
+- When listing tables under `db2`, only `table3` is returned.
+- For databases not included in `include_table_list` (such as `db3`), the full
table list will still be fetched from the remote metadata service.
+- Incorrectly formatted entries in `include_table_list` (not in `db.tbl`
format) will be ignored.
+
+:::tip
+This property can be used in combination with `include_database_list`. For
example, first filter the required databases using `include_database_list`,
then further specify the required tables using `include_table_list`.
+:::
+
+### Table Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_table_names` property allows you to control table name case
handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Table names are stored and compared with
their original case. When referencing a table name, it must exactly match the
case in the remote metadata. |
+| `1` | Stored as lowercase | Table names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase table names to
access external data sources. |
+| `2` | Case-insensitive comparison | Table names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where table name cases are
inconsistent in the external data source and you want to access tables in a
case-insensitive way. |
+
+If this property is not set, it inherits the value of the global variable
`lower_case_table_names` by default.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_table_names` is set to `1` or `2`, if tables with names that
differ only in case exist in the remote metadata (such as `MyTable` and
`mytable`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+### Database Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_database_names` property allows you to control database name
case handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Database names are stored and compared with
their original case. When referencing a database name, it must exactly match
the case in the remote metadata. |
+| `1` | Stored as lowercase | Database names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase database names
to access external data sources. |
+| `2` | Case-insensitive comparison | Database names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where database name cases are
inconsistent in the external data source and you want to access databases in a
case-insensitive way. |
+
+The default value is `0` (case-sensitive).
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_database_names` is set to `1` or `2`, if databases with names
that differ only in case exist in the remote metadata (such as `MyDB` and
`mydb`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+:::info
+`lower_case_database_names` and `lower_case_table_names` can be set
independently without affecting each other. For example, you can set database
names to be case-sensitive (`0`) while setting table names to be
case-insensitive (`2`).
+:::
### Column Type Mapping
-After a user creates a catalog, Doris automatically synchronizes the
databases, tables, and columns of the catalog. For column type mapping rules of
different catalogs, please refer to the documentation of the corresponding
catalog.
+After a user creates a data catalog, Doris automatically synchronizes the
databases, tables, and schemas of the data catalog. For column type mapping
rules of different data catalogs, please refer to the corresponding data
catalog documentation.
-For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris will map the column type to
`UNSUPPORTED`. For queries involving `UNSUPPORTED` types, see the example below:
+For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris maps the column type to `UNSUPPORTED`.
For queries involving `UNSUPPORTED` types, see the following examples:
Assume the synchronized table schema is:
@@ -78,18 +164,18 @@ SELECT k1, k3 FROM table; -- Error: Unsupported
type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```
-### Nullable Attribute
+### Nullable Property
-Doris currently has special restrictions on the Nullable attribute support for
external table columns, with specific behaviors as follows:
+Doris currently has special limitations on the Nullable property support for
external table columns. The specific behavior is as follows:
| Source Type | Doris Read Behavior | Doris Write Behavior |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | Allow writing Null values |
-| Not Null | Nullable, i.e., still treated as columns that allow NULL during
reading | Allow writing Null values, i.e., no strict checking for Null values.
Users need to ensure data integrity and consistency themselves.|
+| --- | --- | --- |
+| Nullable | Nullable | Allows writing Null values |
+| Not Null | Nullable, i.e., still read as a column that allows NULL | Allows
writing Null values, i.e., no strict check on Null values. Users need to ensure
data integrity and consistency on their own. |
-## Using Catalog
+## Using Data Catalogs
-### Viewing Catalog
+### Viewing Data Catalogs
After creation, you can view the catalog using the `SHOW CATALOGS` command:
@@ -103,13 +189,13 @@ mysql> SHOW CATALOGS;
+-----------+-----------------+----------+-----------+-------------------------+---------------------+------------------------+
```
-You can view the statement to create a catalog using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
+You can view the CREATE CATALOG statement using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
-### Switching Catalog
+### Switching Data Catalogs
-Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding catalog. This is similar to using the `USE` statement to
switch databases.
+Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding data catalog, similar to using the `USE` statement to
switch databases.
-After switching to a catalog, you can use the `USE` statement to continue
switching to a specified database, or use `SHOW DATABASES` to view the
databases under the current catalog.
+After switching to a data catalog, you can use the `USE` statement to further
switch to a specific database, or use `SHOW DATABASES` to view the databases
under the current data catalog.
```sql
SWITCH iceberg_catalog;
@@ -127,13 +213,13 @@ SHOW DATABASES;
USE iceberg_db;
```
-You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to switch directly to a specified database within
a specified catalog:
+You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to directly switch to a specific database under a
specific data catalog:
```sql
USE iceberg_catalog.iceberg_db;
```
-Fully qualified names can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
+The fully qualified name can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
```sql
# Command line tool
@@ -143,33 +229,36 @@ mysql -h host -P9030 -uroot -Diceberg_catalog.iceberg_db
jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
```
-The fixed name for the built-in catalog is `internal`. The switching method is
the same as for external catalogs.
+The built-in data catalog has a fixed name of `internal`. The switching method
is the same as for external data catalogs.
+
+### Default Data Catalog
-### Default Catalog
-The user attribute `default_init_catalog` is used to set the default catalog
for a specific user. Once set, when the specified user connects to Doris, they
will automatically switch to the set catalog.
+Use the user property `default_init_catalog` to set the default data catalog
for a specific user. Once set, when the specified user connects to Doris, the
session will automatically switch to the configured data catalog.
```sql
SET PROPERTY default_init_catalog=hive_catalog;
```
-Note 1: If the catalog has been explicitly specified in the MySQL command line
or JDBC connection strings, then the specified catalog will be used, and the
`default_init_catalog` user attribute will not take effect.
-Note 2: If the catalog set by the user attribute `default_init_catalog` no
longer exists, it will automatically switch to the default `internal` catalog.
-Note 3: This feature takes effect starting from version v3.1.x.
+Note 1: If a data catalog is explicitly specified in the MySQL command line or
JDBC connection string, the specified one takes precedence and the
`default_init_catalog` user property does not take effect.
+
+Note 2: If the data catalog set by the user property `default_init_catalog` no
longer exists, the session will automatically switch to the default `internal`
data catalog.
-### Simple Query
+Note 3: This feature is available starting from version 3.1.x.
-You can query tables in external catalogs using any SQL statement supported by
Doris.
+### Simple Queries
+
+You can query tables in external data catalogs using any SQL statement
supported by Doris.
```sql
SELECT id, SUM(cost) FROM iceberg_db.table1
GROUP BY id ORDER BY id;
```
-### Cross-Catalog Query
+### Cross-Catalog Queries
-Doris supports join queries across different catalogs.
+Doris supports join queries across data catalogs.
-Here, let's create another MySQL Catalog:
+Here we create another [MySQL Catalog](./catalogs/jdbc-mysql-catalog.md):
```sql
CREATE CATALOG mysql_catalog properties(
@@ -182,7 +271,7 @@ CREATE CATALOG mysql_catalog properties(
);
```
-Then, perform a join query between Iceberg tables and MySQL tables using SQL:
+Then perform a join query between the Iceberg table and the MySQL table using
SQL:
```sql
SELECT * FROM
@@ -190,16 +279,16 @@ iceberg_catalog.iceberg_db.table1 tbl1 JOIN
mysql_catalog.mysql_db.dim_table tbl
ON tbl1.id = tbl2.id;
```
-### Data Import
+### Data Ingestion
-You can import data from a data source into Doris using the `INSERT` command.
+You can import data from data sources into Doris using the `INSERT` command.
```sql
INSERT INTO internal.doris_db.tbl1
SELECT * FROM iceberg_catalog.iceberg_db.table1;
```
-You can also use the `CTAS (Create Table As Select)` statement to create an
internal Doris table from an external data source and import the data:
+You can also use the `CTAS (Create Table As Select)` statement to create a
Doris internal table from an external data source and import the data:
```sql
CREATE TABLE internal.doris_db.tbl1
@@ -210,15 +299,17 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
### Data Write-Back
-Doris supports writing data back to external data sources using the `INSERT`
statement. For more details, refer to:
+Doris supports writing data back to external data sources directly using the
`INSERT` statement. For details, refer to:
* [Hive Catalog](./catalogs/hive-catalog.mdx)
+
* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+
* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
-## Refreshing Catalog
+## Refreshing Data Catalogs
-Catalogs created in Doris act as "proxy" to access the metadata services of
corresponding data sources. Doris caches some metadata to improve access
performance and reduce frequent cross-network requests. However, the cache has
a validity period, and without refreshing, you cannot access the latest
metadata. Therefore, Doris provides several ways to refresh catalogs.
+Data catalogs created in Doris act as "proxies" to access the metadata service
of the corresponding data source. Doris caches some metadata. Caching can
improve metadata access performance and avoid frequent cross-network requests.
However, caching also has timeliness issues — if the cache is not refreshed,
the latest metadata cannot be accessed. Therefore, Doris provides multiple ways
to refresh data catalogs.
```sql
-- Refresh catalog
@@ -231,13 +322,13 @@ REFRESH DATABASE catalog_name.db_name;
REFRESH TABLE catalog_name.db_name.table_name;
```
-Doris also supports disabling metadata caching to access the latest metadata
in real-time.
+Doris also supports disabling metadata caching to enable real-time access to
the latest metadata.
For detailed information and configuration of metadata caching, please refer
to: [Metadata Cache](./meta-cache.md)
-## Modifying Catalog
+## Modifying Data Catalogs
-You can modify the properties or name of a catalog using the `ALTER CATALOG`
statement:
+You can modify the properties or name of a data catalog using `ALTER CATALOG`:
```sql
-- Rename a catalog
@@ -250,19 +341,16 @@ ALTER CATALOG iceberg_catalog SET PROPERTIES ('key1' =
'value1' [, 'key' = 'valu
ALTER CATALOG iceberg_catalog MODIFY COMMENT 'my iceberg catalog';
```
-## Deleting Catalog
+## Dropping Data Catalogs
-You can delete a specified external catalog using the `DROP CATALOG` statement.
+You can drop a specified external data catalog using `DROP CATALOG`.
```sql
DROP CATALOG [IF EXISTS] iceberg_catalog;
```
-Deleting an external catalog from Doris does not remove the actual data; it
only deletes the mapping relationship stored in Doris.
+Dropping an external data catalog from Doris does not delete the actual data.
It only removes the data catalog mapping stored in Doris.
## Permission Management
-The permission management for databases and tables in an external catalog is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
-
-
-
+Permission management for databases and tables in external data catalogs is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalog-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalog-overview.md
index 6d5f9567175..5b096d18f00 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalog-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalog-overview.md
@@ -2,13 +2,13 @@
{
"title": "数据目录概述",
"language": "zh-CN",
- "description": "数据目录(Data Catalog)用于描述一个数据源的属性。"
+ "description": "了解 Apache Doris 数据目录(Data Catalog)的概念与使用方法。通过创建外部数据目录,接入
Hive、Iceberg、Paimon、PostgreSQL 等多种数据源,实现跨源联邦查询、数据导入与写回。"
}
---
数据目录(Data Catalog)用于描述一个数据源的属性。
-在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、MySQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
+在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、Paimon、PostgreSQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
Doris 中的数据目录分为两种:
@@ -46,18 +46,105 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
### 通用属性
-除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperites}`。
+除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperties}`。
| 属性名 | 描述
| 示例 |
| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
| `include_database_list` | 支持只同步指定的多个 Database,以 `,` 分隔。默认同步所有
Database。Database 名称是大小写敏感的。当外部数据源有大量 Database,但仅需访问个别 Database
时,可以使用此参数,避免大量的元数据同步。 | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分割。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于 `include`
| `'exclude_database_list' = 'db1,db2'` |
+| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分隔。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于
`include`。 | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | 支持只同步指定的多个表,以 `db.tbl` 格式指定,多个表之间以 `,` 分隔。设置后,列举某个
Database 下的表时将仅返回指定的表,而不会从远端元数据服务获取完整的表列表。适用于外部数据源表数量庞大、获取全量表列表可能超时的场景。 |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog 级别的表名大小写控制。取值及含义见下方
[表名大小写](#表名大小写lower_case_table_names) 小节。默认值继承全局变量 `lower_case_table_names`
的设置。 | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog 级别的数据库名大小写控制。取值及含义见下方
[数据库名大小写](#数据库名大小写lower_case_database_names) 小节。默认值为 `0`(大小写敏感)。 |
`'lower_case_database_names' = '2'` |
+
+### 指定表列表
+
+该功能自 4.1.0 版本起支持。
+
+当外部数据源(如 Hive Metastore)包含大量表时,从远端元数据服务获取完整的表列表可能非常耗时甚至超时。通过设置
`include_table_list` 属性,可以指定需要同步的表,避免从远端获取全量表列表。
+
+`include_table_list` 使用 `db.tbl` 的格式,多个表之间以英文逗号 `,` 分隔。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+设置后的行为:
+
+- 当列举 `db1` 下的表时,仅返回 `table1` 和 `table2`,不会调用远端元数据服务的全量表列表接口。
+- 当列举 `db2` 下的表时,仅返回 `table3`。
+- 对于未在 `include_table_list` 中出现的 Database(如 `db3`),仍然会从远端元数据服务获取完整的表列表。
+- `include_table_list` 中的格式不正确的条目(非 `db.tbl` 格式)将被忽略。
+
+:::tip
+此属性可以与 `include_database_list` 配合使用。例如先通过 `include_database_list` 过滤出需要的
Database,再通过 `include_table_list` 进一步精确指定需要的表。
+:::
+
+### 表名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_table_names` 属性,可以在 Catalog 级别控制表名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 表名按原始大小写存储和比较。引用表名时必须与远端元数据中的表名大小写完全一致。 |
+| `1` | 转为小写存储 | 表名在 Doris 中存储为小写形式。适用于希望统一使用小写表名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
表名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中表名大小写不统一,希望以不区分大小写的方式访问表的场景。 |
+
+如果未设置此属性,默认继承全局变量 `lower_case_table_names` 的值。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_table_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名表(如 `MyTable` 和
`mytable`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+### 数据库名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_database_names` 属性,可以在 Catalog 级别控制数据库名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 数据库名按原始大小写存储和比较。引用数据库名时必须与远端元数据中的数据库名大小写完全一致。 |
+| `1` | 转为小写存储 | 数据库名在 Doris 中存储为小写形式。适用于希望统一使用小写数据库名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
数据库名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中数据库名大小写不统一,希望以不区分大小写的方式访问数据库的场景。 |
+
+默认值为 `0`(大小写敏感)。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_database_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名数据库(如 `MyDB`
和 `mydb`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+:::info
+`lower_case_database_names` 和 `lower_case_table_names`
可以独立设置,互不影响。例如,可以设置数据库名大小写敏感(`0`),而表名大小写不敏感(`2`)。
+:::
### 列类型映射
用户创建数据目录后,Doris 会自动同步数据目录的数据库、表和 Schema。不同数据目录的列类型映射规则请参阅对应的数据目录文档。
-对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`, `INTERVAL` 等,Doris 会将列类型映射为
`UNSUPPORTED` 类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
+对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`、`INTERVAL` 等,Doris 会将列类型映射为 `UNSUPPORTED`
类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
假设同步后的表 Schema 为:
@@ -82,9 +169,9 @@ SELECT k1, k4 FROM table; -- Query OK.
Doris 目前对外表列的 Nullable 属性支持有特殊限制,具体行为如下:
| 源类型 | Doris 读取行为 | Doris 写入行为 |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | 允许写入 Null 值 |
-| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。|
+| --- | --- | --- |
+| Nullable | Nullable | 允许写入 Null 值 |
+| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。 |
## 使用数据目录
@@ -152,9 +239,11 @@ jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
SET PROPERTY default_init_catalog=hive_catalog;
```
-注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
-注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
-注意 3:该功能从 v3.1.x 版本开始生效;
+注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效。
+
+注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录。
+
+注意 3:该功能从 3.1.x 版本开始生效。
### 简单查询
@@ -212,11 +301,11 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
Doris 支持通过 `INSERT` 语句直接将数据写回到外部数据源。具体参阅:
-* [ Hive Catalog ](./catalogs/hive-catalog.mdx)
+* [Hive Catalog](./catalogs/hive-catalog.mdx)
-* [ Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
-* [ JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
+* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
## 刷新数据目录
@@ -265,4 +354,3 @@ DROP CATALOG [IF EXISTS] iceberg_catalog;
## 权限管理
外部数据目录中库表的权限管理和内表一致。具体可参阅
[认证和鉴权](../admin-manual/auth/authentication-and-authorization.md) 文档。
-
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalog-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalog-overview.md
index 6d5f9567175..5b096d18f00 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalog-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalog-overview.md
@@ -2,13 +2,13 @@
{
"title": "数据目录概述",
"language": "zh-CN",
- "description": "数据目录(Data Catalog)用于描述一个数据源的属性。"
+ "description": "了解 Apache Doris 数据目录(Data Catalog)的概念与使用方法。通过创建外部数据目录,接入
Hive、Iceberg、Paimon、PostgreSQL 等多种数据源,实现跨源联邦查询、数据导入与写回。"
}
---
数据目录(Data Catalog)用于描述一个数据源的属性。
-在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、MySQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
+在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、Paimon、PostgreSQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
Doris 中的数据目录分为两种:
@@ -46,18 +46,105 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
### 通用属性
-除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperites}`。
+除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperties}`。
| 属性名 | 描述
| 示例 |
| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
| `include_database_list` | 支持只同步指定的多个 Database,以 `,` 分隔。默认同步所有
Database。Database 名称是大小写敏感的。当外部数据源有大量 Database,但仅需访问个别 Database
时,可以使用此参数,避免大量的元数据同步。 | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分割。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于 `include`
| `'exclude_database_list' = 'db1,db2'` |
+| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分隔。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于
`include`。 | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | 支持只同步指定的多个表,以 `db.tbl` 格式指定,多个表之间以 `,` 分隔。设置后,列举某个
Database 下的表时将仅返回指定的表,而不会从远端元数据服务获取完整的表列表。适用于外部数据源表数量庞大、获取全量表列表可能超时的场景。 |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog 级别的表名大小写控制。取值及含义见下方
[表名大小写](#表名大小写lower_case_table_names) 小节。默认值继承全局变量 `lower_case_table_names`
的设置。 | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog 级别的数据库名大小写控制。取值及含义见下方
[数据库名大小写](#数据库名大小写lower_case_database_names) 小节。默认值为 `0`(大小写敏感)。 |
`'lower_case_database_names' = '2'` |
+
+### 指定表列表
+
+该功能自 4.1.0 版本起支持。
+
+当外部数据源(如 Hive Metastore)包含大量表时,从远端元数据服务获取完整的表列表可能非常耗时甚至超时。通过设置
`include_table_list` 属性,可以指定需要同步的表,避免从远端获取全量表列表。
+
+`include_table_list` 使用 `db.tbl` 的格式,多个表之间以英文逗号 `,` 分隔。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+设置后的行为:
+
+- 当列举 `db1` 下的表时,仅返回 `table1` 和 `table2`,不会调用远端元数据服务的全量表列表接口。
+- 当列举 `db2` 下的表时,仅返回 `table3`。
+- 对于未在 `include_table_list` 中出现的 Database(如 `db3`),仍然会从远端元数据服务获取完整的表列表。
+- `include_table_list` 中的格式不正确的条目(非 `db.tbl` 格式)将被忽略。
+
+:::tip
+此属性可以与 `include_database_list` 配合使用。例如先通过 `include_database_list` 过滤出需要的
Database,再通过 `include_table_list` 进一步精确指定需要的表。
+:::
+
+### 表名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_table_names` 属性,可以在 Catalog 级别控制表名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 表名按原始大小写存储和比较。引用表名时必须与远端元数据中的表名大小写完全一致。 |
+| `1` | 转为小写存储 | 表名在 Doris 中存储为小写形式。适用于希望统一使用小写表名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
表名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中表名大小写不统一,希望以不区分大小写的方式访问表的场景。 |
+
+如果未设置此属性,默认继承全局变量 `lower_case_table_names` 的值。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_table_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名表(如 `MyTable` 和
`mytable`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+### 数据库名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_database_names` 属性,可以在 Catalog 级别控制数据库名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 数据库名按原始大小写存储和比较。引用数据库名时必须与远端元数据中的数据库名大小写完全一致。 |
+| `1` | 转为小写存储 | 数据库名在 Doris 中存储为小写形式。适用于希望统一使用小写数据库名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
数据库名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中数据库名大小写不统一,希望以不区分大小写的方式访问数据库的场景。 |
+
+默认值为 `0`(大小写敏感)。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_database_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名数据库(如 `MyDB`
和 `mydb`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+:::info
+`lower_case_database_names` 和 `lower_case_table_names`
可以独立设置,互不影响。例如,可以设置数据库名大小写敏感(`0`),而表名大小写不敏感(`2`)。
+:::
### 列类型映射
用户创建数据目录后,Doris 会自动同步数据目录的数据库、表和 Schema。不同数据目录的列类型映射规则请参阅对应的数据目录文档。
-对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`, `INTERVAL` 等,Doris 会将列类型映射为
`UNSUPPORTED` 类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
+对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`、`INTERVAL` 等,Doris 会将列类型映射为 `UNSUPPORTED`
类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
假设同步后的表 Schema 为:
@@ -82,9 +169,9 @@ SELECT k1, k4 FROM table; -- Query OK.
Doris 目前对外表列的 Nullable 属性支持有特殊限制,具体行为如下:
| 源类型 | Doris 读取行为 | Doris 写入行为 |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | 允许写入 Null 值 |
-| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。|
+| --- | --- | --- |
+| Nullable | Nullable | 允许写入 Null 值 |
+| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。 |
## 使用数据目录
@@ -152,9 +239,11 @@ jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
SET PROPERTY default_init_catalog=hive_catalog;
```
-注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
-注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
-注意 3:该功能从 v3.1.x 版本开始生效;
+注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效。
+
+注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录。
+
+注意 3:该功能从 3.1.x 版本开始生效。
### 简单查询
@@ -212,11 +301,11 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
Doris 支持通过 `INSERT` 语句直接将数据写回到外部数据源。具体参阅:
-* [ Hive Catalog ](./catalogs/hive-catalog.mdx)
+* [Hive Catalog](./catalogs/hive-catalog.mdx)
-* [ Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
-* [ JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
+* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
## 刷新数据目录
@@ -265,4 +354,3 @@ DROP CATALOG [IF EXISTS] iceberg_catalog;
## 权限管理
外部数据目录中库表的权限管理和内表一致。具体可参阅
[认证和鉴权](../admin-manual/auth/authentication-and-authorization.md) 文档。
-
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalog-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalog-overview.md
index 035bf3e9b10..5b096d18f00 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalog-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalog-overview.md
@@ -2,13 +2,13 @@
{
"title": "数据目录概述",
"language": "zh-CN",
- "description": "数据目录(Data Catalog)用于描述一个数据源的属性。"
+ "description": "了解 Apache Doris 数据目录(Data Catalog)的概念与使用方法。通过创建外部数据目录,接入
Hive、Iceberg、Paimon、PostgreSQL 等多种数据源,实现跨源联邦查询、数据导入与写回。"
}
---
数据目录(Data Catalog)用于描述一个数据源的属性。
-在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、MySQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
+在 Doris 中,可以创建多个数据目录指向不同的数据源(如 Hive、Iceberg、Paimon、PostgreSQL)。Doris
会通过数据目录,自动获取对应数据源的库、表、Schema、分区、数据位置等。用户可以通过标准的 SQL
语句访问这些数据目录进行数据分析,并且可以对多个数据目录中的数据进行关联查询。
Doris 中的数据目录分为两种:
@@ -25,7 +25,7 @@ Doris 中的数据目录分为两种:
| 数据集成 | ZeroETL 方案,直接访问不同数据源生成结果数据,或让数据在不同数据源中便捷流转。 |
| 数据写回 | 通过 Doris 进行数据加工处理后,写回到外部数据源。 |
-本文以 [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
为例,重点介绍数据目录的基础操作。不同数据目录的详细介绍,请参阅对应的数据目录文档。
+本文以 [Iceberg Catalog](./catalogs/iceberg-catalog)
为例,重点介绍数据目录的基础操作。不同数据目录的详细介绍,请参阅对应的数据目录文档。
## 创建数据目录
@@ -46,18 +46,105 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
### 通用属性
-除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperites}`。
+除每个数据目录特有的属性集合外,这里介绍所有数据目录通用的属性 `{CommonProperties}`。
| 属性名 | 描述
| 示例 |
| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
| `include_database_list` | 支持只同步指定的多个 Database,以 `,` 分隔。默认同步所有
Database。Database 名称是大小写敏感的。当外部数据源有大量 Database,但仅需访问个别 Database
时,可以使用此参数,避免大量的元数据同步。 | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分割。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于 `include`
| `'exclude_database_list' = 'db1,db2'` |
+| `exclude_database_list` | 支持指定不需要同步的多个 Database,以 `,` 分隔。默认不做任何过滤,同步所有
Database。Database 名称是大小写敏感的。适用场景同上,反向排除不需要访问的数据库。如果冲突,`exclude` 优先级高于
`include`。 | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | 支持只同步指定的多个表,以 `db.tbl` 格式指定,多个表之间以 `,` 分隔。设置后,列举某个
Database 下的表时将仅返回指定的表,而不会从远端元数据服务获取完整的表列表。适用于外部数据源表数量庞大、获取全量表列表可能超时的场景。 |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog 级别的表名大小写控制。取值及含义见下方
[表名大小写](#表名大小写lower_case_table_names) 小节。默认值继承全局变量 `lower_case_table_names`
的设置。 | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog 级别的数据库名大小写控制。取值及含义见下方
[数据库名大小写](#数据库名大小写lower_case_database_names) 小节。默认值为 `0`(大小写敏感)。 |
`'lower_case_database_names' = '2'` |
+
+### 指定表列表
+
+该功能自 4.1.0 版本起支持。
+
+当外部数据源(如 Hive Metastore)包含大量表时,从远端元数据服务获取完整的表列表可能非常耗时甚至超时。通过设置
`include_table_list` 属性,可以指定需要同步的表,避免从远端获取全量表列表。
+
+`include_table_list` 使用 `db.tbl` 的格式,多个表之间以英文逗号 `,` 分隔。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+设置后的行为:
+
+- 当列举 `db1` 下的表时,仅返回 `table1` 和 `table2`,不会调用远端元数据服务的全量表列表接口。
+- 当列举 `db2` 下的表时,仅返回 `table3`。
+- 对于未在 `include_table_list` 中出现的 Database(如 `db3`),仍然会从远端元数据服务获取完整的表列表。
+- `include_table_list` 中的格式不正确的条目(非 `db.tbl` 格式)将被忽略。
+
+:::tip
+此属性可以与 `include_database_list` 配合使用。例如先通过 `include_database_list` 过滤出需要的
Database,再通过 `include_table_list` 进一步精确指定需要的表。
+:::
+
+### 表名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_table_names` 属性,可以在 Catalog 级别控制表名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 表名按原始大小写存储和比较。引用表名时必须与远端元数据中的表名大小写完全一致。 |
+| `1` | 转为小写存储 | 表名在 Doris 中存储为小写形式。适用于希望统一使用小写表名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
表名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中表名大小写不统一,希望以不区分大小写的方式访问表的场景。 |
+
+如果未设置此属性,默认继承全局变量 `lower_case_table_names` 的值。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_table_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名表(如 `MyTable` 和
`mytable`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+### 数据库名大小写
+
+该功能自 4.1.0 版本起支持。
+
+通过 `lower_case_database_names` 属性,可以在 Catalog 级别控制数据库名的大小写处理方式。该属性支持三种模式:
+
+| 值 | 模式 | 描述 |
+| -- | ---- | ---- |
+| `0` | 大小写敏感(默认) | 数据库名按原始大小写存储和比较。引用数据库名时必须与远端元数据中的数据库名大小写完全一致。 |
+| `1` | 转为小写存储 | 数据库名在 Doris 中存储为小写形式。适用于希望统一使用小写数据库名访问外部数据源的场景。 |
+| `2` | 大小写不敏感比较 |
数据库名在比较时忽略大小写,但在显示时保留远端元数据中的原始大小写。适用于外部数据源中数据库名大小写不统一,希望以不区分大小写的方式访问数据库的场景。 |
+
+默认值为 `0`(大小写敏感)。
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+当 `lower_case_database_names` 设置为 `1` 或 `2` 时,如果远端元数据中存在仅大小写不同的同名数据库(如 `MyDB`
和 `mydb`),可能会导致冲突。Doris 会检测此类冲突并报错。
+:::
+
+:::info
+`lower_case_database_names` 和 `lower_case_table_names`
可以独立设置,互不影响。例如,可以设置数据库名大小写敏感(`0`),而表名大小写不敏感(`2`)。
+:::
### 列类型映射
用户创建数据目录后,Doris 会自动同步数据目录的数据库、表和 Schema。不同数据目录的列类型映射规则请参阅对应的数据目录文档。
-对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`, `INTERVAL` 等,Doris 会将列类型映射为
`UNSUPPORTED` 类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
+对于当前无法映射到 Doris 列类型的外部数据类型,如 `UNION`、`INTERVAL` 等,Doris 会将列类型映射为 `UNSUPPORTED`
类型。对于 `UNSUPPORTED` 类型的查询,示例如下:
假设同步后的表 Schema 为:
@@ -82,9 +169,9 @@ SELECT k1, k4 FROM table; -- Query OK.
Doris 目前对外表列的 Nullable 属性支持有特殊限制,具体行为如下:
| 源类型 | Doris 读取行为 | Doris 写入行为 |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | 允许写入 Null 值 |
-| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。|
+| --- | --- | --- |
+| Nullable | Nullable | 允许写入 Null 值 |
+| Not Null | Nullable,即依然当做可允许为 NULL 的列进行读取 | 允许写入 Null 值,即不对 Null
值进行严格检查。用户需要自行保证数据的完整性和一致性。 |
## 使用数据目录
@@ -152,9 +239,11 @@ jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
SET PROPERTY default_init_catalog=hive_catalog;
```
-注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效;
-注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录;
-注意 3:该功能从 v3.1.x 版本开始生效;
+注意 1:如果 MySQL 命令行或 JDBC 连接串中已经明确指定了数据目录,则以指定的为准,`default_init_catalog` 用户属性不生效。
+
+注意 2:如果用户属性 `default_init_catalog` 设置的数据目录已经不存在,则自动切换到默认的 `internal` 数据目录。
+
+注意 3:该功能从 3.1.x 版本开始生效。
### 简单查询
@@ -212,11 +301,11 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
Doris 支持通过 `INSERT` 语句直接将数据写回到外部数据源。具体参阅:
-* [ Hive Catalog ](./catalogs/hive-catalog.mdx)
+* [Hive Catalog](./catalogs/hive-catalog.mdx)
-* [ Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
-* [ JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
+* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
## 刷新数据目录
@@ -265,4 +354,3 @@ DROP CATALOG [IF EXISTS] iceberg_catalog;
## 权限管理
外部数据目录中库表的权限管理和内表一致。具体可参阅
[认证和鉴权](../admin-manual/auth/authentication-and-authorization.md) 文档。
-
diff --git a/versioned_docs/version-3.x/lakehouse/catalog-overview.md
b/versioned_docs/version-3.x/lakehouse/catalog-overview.md
index f70c850fa74..2051135a0f2 100644
--- a/versioned_docs/version-3.x/lakehouse/catalog-overview.md
+++ b/versioned_docs/version-3.x/lakehouse/catalog-overview.md
@@ -2,32 +2,32 @@
{
"title": "Data Catalog Overview",
"language": "en",
- "description": "A Data Catalog is used to describe the attributes of a
data source."
+ "description": "Learn about Apache Doris Data Catalog concepts and usage.
Create external data catalogs to connect to Hive, Iceberg, Paimon, PostgreSQL,
and other data sources for federated cross-source queries, data ingestion, and
write-back."
}
---
-A Data Catalog is used to describe the attributes of a data source.
+A Data Catalog describes the properties of a data source.
-In Doris, multiple catalogs can be created to point to different data sources
(such as Hive, Iceberg, MySQL). Doris will automatically obtain the databases,
tables, columns, partitions, data locations, etc. of the corresponding data
sources through the catalogs. Users can access these catalogs for data analysis
through standard SQL statements and can conduct join queries on the data from
multiple catalogs.
+In Doris, you can create multiple data catalogs pointing to different data
sources (such as Hive, Iceberg, Paimon, PostgreSQL). Doris automatically
retrieves databases, tables, schemas, partitions, and data locations from the
corresponding data source through data catalogs. Users can access these data
catalogs for data analysis using standard SQL statements, and perform join
queries across data from multiple data catalogs.
-There are two types of catalogs in Doris:
+There are two types of data catalogs in Doris:
-| Type | Description |
+| Type | Description |
| ---------------- | -------------------------------------------------------- |
-| Internal Catalog | The built-in catalog, named `internal`, used to store
Doris internal table data. It cannot be created, modified, or deleted. |
-| External Catalog | External catalogs refer to all catalogs other than the
Internal Catalog. Users can create, modify, and delete external catalogs. |
+| Internal Catalog | The built-in data catalog with a fixed name of
`internal`, used to store Doris internal table data. It cannot be created,
modified, or dropped. |
+| External Catalog | External data catalogs, referring to all data catalogs
other than the Internal Catalog. Users can create, modify, and drop external
data catalogs. |
-Catalogs are mainly applicable to the following three scenarios, but different
catalogs are suitable for different scenarios. For details, see the
documentation for the corresponding catalog.
+Data catalogs are primarily applicable to the following three types of
scenarios, but different data catalogs have different applicable scenarios.
Refer to the corresponding data catalog documentation for details.
-| Scenario | Description |
+| Scenario | Description |
| ---- | ------------------------------------------- |
-| Query Acceleration | Direct query acceleration for data lakes such as Hive,
Iceberg, Paimon, etc. |
-| Data Integration | ZeroETL solution, directly accessing different data
sources to generate result data, or facilitating data flow between different
data sources. |
-| Data Write-back | After data processing via Doris, write back to external
data sources. |
+| Query Acceleration | Directly accelerate queries on lakehouse data such as
Hive, Iceberg, Paimon, etc. |
+| Data Integration | ZeroETL approach to directly access different data
sources to generate result data, or enable convenient data flow between
different data sources. |
+| Data Write-Back | Process and transform data through Doris, then write it
back to external data sources. |
-This document uses [Iceberg Catalog](./catalogs/iceberg-catalog.mdx) as an
example to focus on the basic operations of catalogs. For detailed descriptions
of different catalogs, please refer to the documentation of the corresponding
catalog.
+This article uses [Iceberg Catalog](./catalogs/iceberg-catalog) as an example
to introduce the basic operations of data catalogs. For detailed introductions
to different data catalogs, please refer to the corresponding data catalog
documentation.
-## Creating Catalog
+## Creating a Data Catalog
Create an Iceberg Catalog using the `CREATE CATALOG` statement.
@@ -42,23 +42,109 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
);
```
-Essentially, a catalog created in Doris acts as a "proxy" to access the
metadata services (such as Hive Metastore) and storage services (such as
HDFS/S3) of the corresponding data source. Doris only stores connection
properties and other information of the catalog, not the actual metadata and
data of the corresponding data source.
+Essentially, a data catalog created in Doris acts as a "proxy" to access the
metadata service (such as Hive Metastore) and storage service (such as HDFS/S3)
of the corresponding data source. Doris only stores connection properties and
other information about the data catalog, not the actual metadata or data of
the corresponding data source.
### Common Properties
-In addition to the set of properties specific to each catalog, here are the
common properties for all catalogs `{CommonProperties}`.
+In addition to the property set specific to each data catalog, this section
introduces the common properties `{CommonProperties}` shared by all data
catalogs.
-| Property Name | Description
| Example |
-| ------------------------ |
------------------------------------------------------------------------------------------------------------------------------------
| -------------------------------------- |
-| `include_database_list` | Supports synchronizing only specified Databases,
separated by `,`. By default, all Databases are synchronized. Database names
are case-sensitive. Use this parameter when there are many Databases in the
external data source but only a few need to be accessed, to avoid synchronizing
a large amount of metadata. | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | Supports specifying multiple Databases that do
not need to be synchronized, separated by `,`. By default, no filtering is
applied, and all Databases are synchronized. Database names are case-sensitive.
This is used in the same scenario as above, to exclude databases that do not
need to be accessed. If there is a conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| Property | Description | Example |
+| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
+| `include_database_list` | Specifies multiple databases to synchronize,
separated by `,`. By default, all databases are synchronized. Database names
are case-sensitive. When the external data source has a large number of
databases but only a few need to be accessed, this parameter can be used to
avoid synchronizing a large amount of metadata. | `'include_database_list' =
'db1,db2'` |
+| `exclude_database_list` | Specifies multiple databases to exclude from
synchronization, separated by `,`. By default, no filtering is applied and all
databases are synchronized. Database names are case-sensitive. Applicable to
the same scenario as above, but for inversely excluding databases that do not
need to be accessed. In case of conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | Specifies multiple tables to synchronize, in `db.tbl`
format, separated by `,`. When set, listing tables under a database will only
return the specified tables, rather than fetching the full table list from the
remote metadata service. Applicable when the external data source has a large
number of tables and fetching the full table list may time out. |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog-level table name case control. See the
[Table Name Case
Sensitivity](#table-name-case-sensitivity-lower_case_table_names) section below
for values and their meanings. The default value inherits from the global
variable `lower_case_table_names`. | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog-level database name case control. See
the [Database Name Case
Sensitivity](#database-name-case-sensitivity-lower_case_database_names) section
below for values and their meanings. The default value is `0` (case-sensitive).
| `'lower_case_database_names' = '2'` |
+### Specifying Table List
+
+This feature is supported since version 4.1.0.
+
+When the external data source (such as Hive Metastore) contains a large number
of tables, fetching the full table list from the remote metadata service can be
very time-consuming or even time out. By setting the `include_table_list`
property, you can specify the tables to synchronize and avoid fetching the full
table list from the remote.
+
+`include_table_list` uses the `db.tbl` format, with multiple tables separated
by commas `,`.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+Behavior after setting:
+
+- When listing tables under `db1`, only `table1` and `table2` are returned,
without calling the remote metadata service's full table list API.
+- When listing tables under `db2`, only `table3` is returned.
+- For databases not included in `include_table_list` (such as `db3`), the full
table list will still be fetched from the remote metadata service.
+- Incorrectly formatted entries in `include_table_list` (not in `db.tbl`
format) will be ignored.
+
+:::tip
+This property can be used in combination with `include_database_list`. For
example, first filter the required databases using `include_database_list`,
then further specify the required tables using `include_table_list`.
+:::
+
+### Table Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_table_names` property allows you to control table name case
handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Table names are stored and compared with
their original case. When referencing a table name, it must exactly match the
case in the remote metadata. |
+| `1` | Stored as lowercase | Table names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase table names to
access external data sources. |
+| `2` | Case-insensitive comparison | Table names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where table name cases are
inconsistent in the external data source and you want to access tables in a
case-insensitive way. |
+
+If this property is not set, it inherits the value of the global variable
`lower_case_table_names` by default.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_table_names` is set to `1` or `2`, if tables with names that
differ only in case exist in the remote metadata (such as `MyTable` and
`mytable`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+### Database Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_database_names` property allows you to control database name
case handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Database names are stored and compared with
their original case. When referencing a database name, it must exactly match
the case in the remote metadata. |
+| `1` | Stored as lowercase | Database names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase database names
to access external data sources. |
+| `2` | Case-insensitive comparison | Database names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where database name cases are
inconsistent in the external data source and you want to access databases in a
case-insensitive way. |
+
+The default value is `0` (case-sensitive).
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_database_names` is set to `1` or `2`, if databases with names
that differ only in case exist in the remote metadata (such as `MyDB` and
`mydb`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+:::info
+`lower_case_database_names` and `lower_case_table_names` can be set
independently without affecting each other. For example, you can set database
names to be case-sensitive (`0`) while setting table names to be
case-insensitive (`2`).
+:::
### Column Type Mapping
-After a user creates a catalog, Doris automatically synchronizes the
databases, tables, and columns of the catalog. For column type mapping rules of
different catalogs, please refer to the documentation of the corresponding
catalog.
+After a user creates a data catalog, Doris automatically synchronizes the
databases, tables, and schemas of the data catalog. For column type mapping
rules of different data catalogs, please refer to the corresponding data
catalog documentation.
-For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris will map the column type to
`UNSUPPORTED`. For queries involving `UNSUPPORTED` types, see the example below:
+For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris maps the column type to `UNSUPPORTED`.
For queries involving `UNSUPPORTED` types, see the following examples:
Assume the synchronized table schema is:
@@ -78,18 +164,18 @@ SELECT k1, k3 FROM table; -- Error: Unsupported
type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```
-### Nullable Attribute
+### Nullable Property
-Doris currently has special restrictions on the Nullable attribute support for
external table columns, with specific behaviors as follows:
+Doris currently has special limitations on the Nullable property support for
external table columns. The specific behavior is as follows:
| Source Type | Doris Read Behavior | Doris Write Behavior |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | Allow writing Null values |
-| Not Null | Nullable, i.e., still treated as columns that allow NULL during
reading | Allow writing Null values, i.e., no strict checking for Null values.
Users need to ensure data integrity and consistency themselves.|
+| --- | --- | --- |
+| Nullable | Nullable | Allows writing Null values |
+| Not Null | Nullable, i.e., still read as a column that allows NULL | Allows
writing Null values, i.e., no strict check on Null values. Users need to ensure
data integrity and consistency on their own. |
-## Using Catalog
+## Using Data Catalogs
-### Viewing Catalog
+### Viewing Data Catalogs
After creation, you can view the catalog using the `SHOW CATALOGS` command:
@@ -103,13 +189,13 @@ mysql> SHOW CATALOGS;
+-----------+-----------------+----------+-----------+-------------------------+---------------------+------------------------+
```
-You can view the statement to create a catalog using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
+You can view the CREATE CATALOG statement using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
-### Switching Catalog
+### Switching Data Catalogs
-Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding catalog. This is similar to using the `USE` statement to
switch databases.
+Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding data catalog, similar to using the `USE` statement to
switch databases.
-After switching to a catalog, you can use the `USE` statement to continue
switching to a specified database, or use `SHOW DATABASES` to view the
databases under the current catalog.
+After switching to a data catalog, you can use the `USE` statement to further
switch to a specific database, or use `SHOW DATABASES` to view the databases
under the current data catalog.
```sql
SWITCH iceberg_catalog;
@@ -127,13 +213,13 @@ SHOW DATABASES;
USE iceberg_db;
```
-You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to switch directly to a specified database within
a specified catalog:
+You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to directly switch to a specific database under a
specific data catalog:
```sql
USE iceberg_catalog.iceberg_db;
```
-Fully qualified names can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
+The fully qualified name can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
```sql
# Command line tool
@@ -143,33 +229,36 @@ mysql -h host -P9030 -uroot -Diceberg_catalog.iceberg_db
jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
```
-The fixed name for the built-in catalog is `internal`. The switching method is
the same as for external catalogs.
+The built-in data catalog has a fixed name of `internal`. The switching method
is the same as for external data catalogs.
+
+### Default Data Catalog
-### Default Catalog
-The user attribute `default_init_catalog` is used to set the default catalog
for a specific user. Once set, when the specified user connects to Doris, they
will automatically switch to the set catalog.
+Use the user property `default_init_catalog` to set the default data catalog
for a specific user. Once set, when the specified user connects to Doris, the
session will automatically switch to the configured data catalog.
```sql
SET PROPERTY default_init_catalog=hive_catalog;
```
-Note 1: If the catalog has been explicitly specified in the MySQL command line
or JDBC connection strings, then the specified catalog will be used, and the
`default_init_catalog` user attribute will not take effect.
-Note 2: If the catalog set by the user attribute `default_init_catalog` no
longer exists, it will automatically switch to the default `internal` catalog.
-Note 3: This feature takes effect starting from version v3.1.x.
+Note 1: If a data catalog is explicitly specified in the MySQL command line or
JDBC connection string, the specified one takes precedence and the
`default_init_catalog` user property does not take effect.
+
+Note 2: If the data catalog set by the user property `default_init_catalog` no
longer exists, the session will automatically switch to the default `internal`
data catalog.
-### Simple Query
+Note 3: This feature is available starting from version 3.1.x.
-You can query tables in external catalogs using any SQL statement supported by
Doris.
+### Simple Queries
+
+You can query tables in external data catalogs using any SQL statement
supported by Doris.
```sql
SELECT id, SUM(cost) FROM iceberg_db.table1
GROUP BY id ORDER BY id;
```
-### Cross-Catalog Query
+### Cross-Catalog Queries
-Doris supports join queries across different catalogs.
+Doris supports join queries across data catalogs.
-Here, let's create another MySQL Catalog:
+Here we create another [MySQL Catalog](./catalogs/jdbc-mysql-catalog.md):
```sql
CREATE CATALOG mysql_catalog properties(
@@ -182,7 +271,7 @@ CREATE CATALOG mysql_catalog properties(
);
```
-Then, perform a join query between Iceberg tables and MySQL tables using SQL:
+Then perform a join query between the Iceberg table and the MySQL table using
SQL:
```sql
SELECT * FROM
@@ -190,16 +279,16 @@ iceberg_catalog.iceberg_db.table1 tbl1 JOIN
mysql_catalog.mysql_db.dim_table tbl
ON tbl1.id = tbl2.id;
```
-### Data Import
+### Data Ingestion
-You can import data from a data source into Doris using the `INSERT` command.
+You can import data from data sources into Doris using the `INSERT` command.
```sql
INSERT INTO internal.doris_db.tbl1
SELECT * FROM iceberg_catalog.iceberg_db.table1;
```
-You can also use the `CTAS (Create Table As Select)` statement to create an
internal Doris table from an external data source and import the data:
+You can also use the `CTAS (Create Table As Select)` statement to create a
Doris internal table from an external data source and import the data:
```sql
CREATE TABLE internal.doris_db.tbl1
@@ -210,15 +299,17 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
### Data Write-Back
-Doris supports writing data back to external data sources using the `INSERT`
statement. For more details, refer to:
+Doris supports writing data back to external data sources directly using the
`INSERT` statement. For details, refer to:
* [Hive Catalog](./catalogs/hive-catalog.mdx)
+
* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+
* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
-## Refreshing Catalog
+## Refreshing Data Catalogs
-Catalogs created in Doris act as "proxy" to access the metadata services of
corresponding data sources. Doris caches some metadata to improve access
performance and reduce frequent cross-network requests. However, the cache has
a validity period, and without refreshing, you cannot access the latest
metadata. Therefore, Doris provides several ways to refresh catalogs.
+Data catalogs created in Doris act as "proxies" to access the metadata service
of the corresponding data source. Doris caches some metadata. Caching can
improve metadata access performance and avoid frequent cross-network requests.
However, caching also has timeliness issues — if the cache is not refreshed,
the latest metadata cannot be accessed. Therefore, Doris provides multiple ways
to refresh data catalogs.
```sql
-- Refresh catalog
@@ -231,13 +322,13 @@ REFRESH DATABASE catalog_name.db_name;
REFRESH TABLE catalog_name.db_name.table_name;
```
-Doris also supports disabling metadata caching to access the latest metadata
in real-time.
+Doris also supports disabling metadata caching to enable real-time access to
the latest metadata.
For detailed information and configuration of metadata caching, please refer
to: [Metadata Cache](./meta-cache.md)
-## Modifying Catalog
+## Modifying Data Catalogs
-You can modify the properties or name of a catalog using the `ALTER CATALOG`
statement:
+You can modify the properties or name of a data catalog using `ALTER CATALOG`:
```sql
-- Rename a catalog
@@ -250,19 +341,16 @@ ALTER CATALOG iceberg_catalog SET PROPERTIES ('key1' =
'value1' [, 'key' = 'valu
ALTER CATALOG iceberg_catalog MODIFY COMMENT 'my iceberg catalog';
```
-## Deleting Catalog
+## Dropping Data Catalogs
-You can delete a specified external catalog using the `DROP CATALOG` statement.
+You can drop a specified external data catalog using `DROP CATALOG`.
```sql
DROP CATALOG [IF EXISTS] iceberg_catalog;
```
-Deleting an external catalog from Doris does not remove the actual data; it
only deletes the mapping relationship stored in Doris.
+Dropping an external data catalog from Doris does not delete the actual data.
It only removes the data catalog mapping stored in Doris.
## Permission Management
-The permission management for databases and tables in an external catalog is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
-
-
-
+Permission management for databases and tables in external data catalogs is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
diff --git a/versioned_docs/version-4.x/lakehouse/catalog-overview.md
b/versioned_docs/version-4.x/lakehouse/catalog-overview.md
index f70c850fa74..2051135a0f2 100644
--- a/versioned_docs/version-4.x/lakehouse/catalog-overview.md
+++ b/versioned_docs/version-4.x/lakehouse/catalog-overview.md
@@ -2,32 +2,32 @@
{
"title": "Data Catalog Overview",
"language": "en",
- "description": "A Data Catalog is used to describe the attributes of a
data source."
+ "description": "Learn about Apache Doris Data Catalog concepts and usage.
Create external data catalogs to connect to Hive, Iceberg, Paimon, PostgreSQL,
and other data sources for federated cross-source queries, data ingestion, and
write-back."
}
---
-A Data Catalog is used to describe the attributes of a data source.
+A Data Catalog describes the properties of a data source.
-In Doris, multiple catalogs can be created to point to different data sources
(such as Hive, Iceberg, MySQL). Doris will automatically obtain the databases,
tables, columns, partitions, data locations, etc. of the corresponding data
sources through the catalogs. Users can access these catalogs for data analysis
through standard SQL statements and can conduct join queries on the data from
multiple catalogs.
+In Doris, you can create multiple data catalogs pointing to different data
sources (such as Hive, Iceberg, Paimon, PostgreSQL). Doris automatically
retrieves databases, tables, schemas, partitions, and data locations from the
corresponding data source through data catalogs. Users can access these data
catalogs for data analysis using standard SQL statements, and perform join
queries across data from multiple data catalogs.
-There are two types of catalogs in Doris:
+There are two types of data catalogs in Doris:
-| Type | Description |
+| Type | Description |
| ---------------- | -------------------------------------------------------- |
-| Internal Catalog | The built-in catalog, named `internal`, used to store
Doris internal table data. It cannot be created, modified, or deleted. |
-| External Catalog | External catalogs refer to all catalogs other than the
Internal Catalog. Users can create, modify, and delete external catalogs. |
+| Internal Catalog | The built-in data catalog with a fixed name of
`internal`, used to store Doris internal table data. It cannot be created,
modified, or dropped. |
+| External Catalog | External data catalogs, referring to all data catalogs
other than the Internal Catalog. Users can create, modify, and drop external
data catalogs. |
-Catalogs are mainly applicable to the following three scenarios, but different
catalogs are suitable for different scenarios. For details, see the
documentation for the corresponding catalog.
+Data catalogs are primarily applicable to the following three types of
scenarios, but different data catalogs have different applicable scenarios.
Refer to the corresponding data catalog documentation for details.
-| Scenario | Description |
+| Scenario | Description |
| ---- | ------------------------------------------- |
-| Query Acceleration | Direct query acceleration for data lakes such as Hive,
Iceberg, Paimon, etc. |
-| Data Integration | ZeroETL solution, directly accessing different data
sources to generate result data, or facilitating data flow between different
data sources. |
-| Data Write-back | After data processing via Doris, write back to external
data sources. |
+| Query Acceleration | Directly accelerate queries on lakehouse data such as
Hive, Iceberg, Paimon, etc. |
+| Data Integration | ZeroETL approach to directly access different data
sources to generate result data, or enable convenient data flow between
different data sources. |
+| Data Write-Back | Process and transform data through Doris, then write it
back to external data sources. |
-This document uses [Iceberg Catalog](./catalogs/iceberg-catalog.mdx) as an
example to focus on the basic operations of catalogs. For detailed descriptions
of different catalogs, please refer to the documentation of the corresponding
catalog.
+This article uses [Iceberg Catalog](./catalogs/iceberg-catalog) as an example
to introduce the basic operations of data catalogs. For detailed introductions
to different data catalogs, please refer to the corresponding data catalog
documentation.
-## Creating Catalog
+## Creating a Data Catalog
Create an Iceberg Catalog using the `CREATE CATALOG` statement.
@@ -42,23 +42,109 @@ CREATE CATALOG iceberg_catalog PROPERTIES (
);
```
-Essentially, a catalog created in Doris acts as a "proxy" to access the
metadata services (such as Hive Metastore) and storage services (such as
HDFS/S3) of the corresponding data source. Doris only stores connection
properties and other information of the catalog, not the actual metadata and
data of the corresponding data source.
+Essentially, a data catalog created in Doris acts as a "proxy" to access the
metadata service (such as Hive Metastore) and storage service (such as HDFS/S3)
of the corresponding data source. Doris only stores connection properties and
other information about the data catalog, not the actual metadata or data of
the corresponding data source.
### Common Properties
-In addition to the set of properties specific to each catalog, here are the
common properties for all catalogs `{CommonProperties}`.
+In addition to the property set specific to each data catalog, this section
introduces the common properties `{CommonProperties}` shared by all data
catalogs.
-| Property Name | Description
| Example |
-| ------------------------ |
------------------------------------------------------------------------------------------------------------------------------------
| -------------------------------------- |
-| `include_database_list` | Supports synchronizing only specified Databases,
separated by `,`. By default, all Databases are synchronized. Database names
are case-sensitive. Use this parameter when there are many Databases in the
external data source but only a few need to be accessed, to avoid synchronizing
a large amount of metadata. | `'include_database_list' = 'db1,db2'` |
-| `exclude_database_list` | Supports specifying multiple Databases that do
not need to be synchronized, separated by `,`. By default, no filtering is
applied, and all Databases are synchronized. Database names are case-sensitive.
This is used in the same scenario as above, to exclude databases that do not
need to be accessed. If there is a conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| Property | Description | Example |
+| ----------------------- |
------------------------------------------------------------------------- |
------------------------------------- |
+| `include_database_list` | Specifies multiple databases to synchronize,
separated by `,`. By default, all databases are synchronized. Database names
are case-sensitive. When the external data source has a large number of
databases but only a few need to be accessed, this parameter can be used to
avoid synchronizing a large amount of metadata. | `'include_database_list' =
'db1,db2'` |
+| `exclude_database_list` | Specifies multiple databases to exclude from
synchronization, separated by `,`. By default, no filtering is applied and all
databases are synchronized. Database names are case-sensitive. Applicable to
the same scenario as above, but for inversely excluding databases that do not
need to be accessed. In case of conflict, `exclude` takes precedence over
`include`. | `'exclude_database_list' = 'db1,db2'` |
+| `include_table_list` | Specifies multiple tables to synchronize, in `db.tbl`
format, separated by `,`. When set, listing tables under a database will only
return the specified tables, rather than fetching the full table list from the
remote metadata service. Applicable when the external data source has a large
number of tables and fetching the full table list may time out. |
`'include_table_list' = 'db1.tbl1,db1.tbl2,db2.tbl3'` |
+| `lower_case_table_names` | Catalog-level table name case control. See the
[Table Name Case
Sensitivity](#table-name-case-sensitivity-lower_case_table_names) section below
for values and their meanings. The default value inherits from the global
variable `lower_case_table_names`. | `'lower_case_table_names' = '1'` |
+| `lower_case_database_names` | Catalog-level database name case control. See
the [Database Name Case
Sensitivity](#database-name-case-sensitivity-lower_case_database_names) section
below for values and their meanings. The default value is `0` (case-sensitive).
| `'lower_case_database_names' = '2'` |
+### Specifying Table List
+
+This feature is supported since version 4.1.0.
+
+When the external data source (such as Hive Metastore) contains a large number
of tables, fetching the full table list from the remote metadata service can be
very time-consuming or even time out. By setting the `include_table_list`
property, you can specify the tables to synchronize and avoid fetching the full
table list from the remote.
+
+`include_table_list` uses the `db.tbl` format, with multiple tables separated
by commas `,`.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'include_table_list' = 'db1.table1,db1.table2,db2.table3'
+);
+```
+
+Behavior after setting:
+
+- When listing tables under `db1`, only `table1` and `table2` are returned,
without calling the remote metadata service's full table list API.
+- When listing tables under `db2`, only `table3` is returned.
+- For databases not included in `include_table_list` (such as `db3`), the full
table list will still be fetched from the remote metadata service.
+- Incorrectly formatted entries in `include_table_list` (not in `db.tbl`
format) will be ignored.
+
+:::tip
+This property can be used in combination with `include_database_list`. For
example, first filter the required databases using `include_database_list`,
then further specify the required tables using `include_table_list`.
+:::
+
+### Table Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_table_names` property allows you to control table name case
handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Table names are stored and compared with
their original case. When referencing a table name, it must exactly match the
case in the remote metadata. |
+| `1` | Stored as lowercase | Table names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase table names to
access external data sources. |
+| `2` | Case-insensitive comparison | Table names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where table name cases are
inconsistent in the external data source and you want to access tables in a
case-insensitive way. |
+
+If this property is not set, it inherits the value of the global variable
`lower_case_table_names` by default.
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_table_names` is set to `1` or `2`, if tables with names that
differ only in case exist in the remote metadata (such as `MyTable` and
`mytable`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+### Database Name Case Sensitivity
+
+This feature is supported since version 4.1.0.
+
+The `lower_case_database_names` property allows you to control database name
case handling at the Catalog level. This property supports three modes:
+
+| Value | Mode | Description |
+| -- | ---- | ---- |
+| `0` | Case-sensitive (default) | Database names are stored and compared with
their original case. When referencing a database name, it must exactly match
the case in the remote metadata. |
+| `1` | Stored as lowercase | Database names are stored in lowercase in Doris.
Suitable for scenarios where you want to uniformly use lowercase database names
to access external data sources. |
+| `2` | Case-insensitive comparison | Database names are compared in a
case-insensitive manner, but the original case from the remote metadata is
preserved when displayed. Suitable for scenarios where database name cases are
inconsistent in the external data source and you want to access databases in a
case-insensitive way. |
+
+The default value is `0` (case-sensitive).
+
+```sql
+CREATE CATALOG hive_catalog PROPERTIES (
+ 'type' = 'hms',
+ 'hive.metastore.uris' = 'thrift://hms-host:9083',
+ 'lower_case_database_names' = '2',
+ 'lower_case_table_names' = '2'
+);
+```
+
+:::caution
+When `lower_case_database_names` is set to `1` or `2`, if databases with names
that differ only in case exist in the remote metadata (such as `MyDB` and
`mydb`), conflicts may occur. Doris will detect such conflicts and report an
error.
+:::
+
+:::info
+`lower_case_database_names` and `lower_case_table_names` can be set
independently without affecting each other. For example, you can set database
names to be case-sensitive (`0`) while setting table names to be
case-insensitive (`2`).
+:::
### Column Type Mapping
-After a user creates a catalog, Doris automatically synchronizes the
databases, tables, and columns of the catalog. For column type mapping rules of
different catalogs, please refer to the documentation of the corresponding
catalog.
+After a user creates a data catalog, Doris automatically synchronizes the
databases, tables, and schemas of the data catalog. For column type mapping
rules of different data catalogs, please refer to the corresponding data
catalog documentation.
-For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris will map the column type to
`UNSUPPORTED`. For queries involving `UNSUPPORTED` types, see the example below:
+For external data types that cannot currently be mapped to Doris column types,
such as `UNION`, `INTERVAL`, etc., Doris maps the column type to `UNSUPPORTED`.
For queries involving `UNSUPPORTED` types, see the following examples:
Assume the synchronized table schema is:
@@ -78,18 +164,18 @@ SELECT k1, k3 FROM table; -- Error: Unsupported
type 'UNSUPPORTED_TYPE
SELECT k1, k4 FROM table; -- Query OK.
```
-### Nullable Attribute
+### Nullable Property
-Doris currently has special restrictions on the Nullable attribute support for
external table columns, with specific behaviors as follows:
+Doris currently has special limitations on the Nullable property support for
external table columns. The specific behavior is as follows:
| Source Type | Doris Read Behavior | Doris Write Behavior |
-| --- | ------------ | ------------ |
-| Nullable | Nullable | Allow writing Null values |
-| Not Null | Nullable, i.e., still treated as columns that allow NULL during
reading | Allow writing Null values, i.e., no strict checking for Null values.
Users need to ensure data integrity and consistency themselves.|
+| --- | --- | --- |
+| Nullable | Nullable | Allows writing Null values |
+| Not Null | Nullable, i.e., still read as a column that allows NULL | Allows
writing Null values, i.e., no strict check on Null values. Users need to ensure
data integrity and consistency on their own. |
-## Using Catalog
+## Using Data Catalogs
-### Viewing Catalog
+### Viewing Data Catalogs
After creation, you can view the catalog using the `SHOW CATALOGS` command:
@@ -103,13 +189,13 @@ mysql> SHOW CATALOGS;
+-----------+-----------------+----------+-----------+-------------------------+---------------------+------------------------+
```
-You can view the statement to create a catalog using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
+You can view the CREATE CATALOG statement using [SHOW CREATE
CATALOG](../sql-manual/sql-statements/catalog/SHOW-CREATE-CATALOG).
-### Switching Catalog
+### Switching Data Catalogs
-Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding catalog. This is similar to using the `USE` statement to
switch databases.
+Doris provides the `SWITCH` statement to switch the connection session context
to the corresponding data catalog, similar to using the `USE` statement to
switch databases.
-After switching to a catalog, you can use the `USE` statement to continue
switching to a specified database, or use `SHOW DATABASES` to view the
databases under the current catalog.
+After switching to a data catalog, you can use the `USE` statement to further
switch to a specific database, or use `SHOW DATABASES` to view the databases
under the current data catalog.
```sql
SWITCH iceberg_catalog;
@@ -127,13 +213,13 @@ SHOW DATABASES;
USE iceberg_db;
```
-You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to switch directly to a specified database within
a specified catalog:
+You can also use the `USE` statement with the fully qualified name
`catalog_name.database_name` to directly switch to a specific database under a
specific data catalog:
```sql
USE iceberg_catalog.iceberg_db;
```
-Fully qualified names can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
+The fully qualified name can also be used in MySQL command line or JDBC
connection strings to be compatible with the MySQL connection protocol.
```sql
# Command line tool
@@ -143,33 +229,36 @@ mysql -h host -P9030 -uroot -Diceberg_catalog.iceberg_db
jdbc:mysql://host:9030/iceberg_catalog.iceberg_db
```
-The fixed name for the built-in catalog is `internal`. The switching method is
the same as for external catalogs.
+The built-in data catalog has a fixed name of `internal`. The switching method
is the same as for external data catalogs.
+
+### Default Data Catalog
-### Default Catalog
-The user attribute `default_init_catalog` is used to set the default catalog
for a specific user. Once set, when the specified user connects to Doris, they
will automatically switch to the set catalog.
+Use the user property `default_init_catalog` to set the default data catalog
for a specific user. Once set, when the specified user connects to Doris, the
session will automatically switch to the configured data catalog.
```sql
SET PROPERTY default_init_catalog=hive_catalog;
```
-Note 1: If the catalog has been explicitly specified in the MySQL command line
or JDBC connection strings, then the specified catalog will be used, and the
`default_init_catalog` user attribute will not take effect.
-Note 2: If the catalog set by the user attribute `default_init_catalog` no
longer exists, it will automatically switch to the default `internal` catalog.
-Note 3: This feature takes effect starting from version v3.1.x.
+Note 1: If a data catalog is explicitly specified in the MySQL command line or
JDBC connection string, the specified one takes precedence and the
`default_init_catalog` user property does not take effect.
+
+Note 2: If the data catalog set by the user property `default_init_catalog` no
longer exists, the session will automatically switch to the default `internal`
data catalog.
-### Simple Query
+Note 3: This feature is available starting from version 3.1.x.
-You can query tables in external catalogs using any SQL statement supported by
Doris.
+### Simple Queries
+
+You can query tables in external data catalogs using any SQL statement
supported by Doris.
```sql
SELECT id, SUM(cost) FROM iceberg_db.table1
GROUP BY id ORDER BY id;
```
-### Cross-Catalog Query
+### Cross-Catalog Queries
-Doris supports join queries across different catalogs.
+Doris supports join queries across data catalogs.
-Here, let's create another MySQL Catalog:
+Here we create another [MySQL Catalog](./catalogs/jdbc-mysql-catalog.md):
```sql
CREATE CATALOG mysql_catalog properties(
@@ -182,7 +271,7 @@ CREATE CATALOG mysql_catalog properties(
);
```
-Then, perform a join query between Iceberg tables and MySQL tables using SQL:
+Then perform a join query between the Iceberg table and the MySQL table using
SQL:
```sql
SELECT * FROM
@@ -190,16 +279,16 @@ iceberg_catalog.iceberg_db.table1 tbl1 JOIN
mysql_catalog.mysql_db.dim_table tbl
ON tbl1.id = tbl2.id;
```
-### Data Import
+### Data Ingestion
-You can import data from a data source into Doris using the `INSERT` command.
+You can import data from data sources into Doris using the `INSERT` command.
```sql
INSERT INTO internal.doris_db.tbl1
SELECT * FROM iceberg_catalog.iceberg_db.table1;
```
-You can also use the `CTAS (Create Table As Select)` statement to create an
internal Doris table from an external data source and import the data:
+You can also use the `CTAS (Create Table As Select)` statement to create a
Doris internal table from an external data source and import the data:
```sql
CREATE TABLE internal.doris_db.tbl1
@@ -210,15 +299,17 @@ SELECT * FROM iceberg_catalog.iceberg_db.table1;
### Data Write-Back
-Doris supports writing data back to external data sources using the `INSERT`
statement. For more details, refer to:
+Doris supports writing data back to external data sources directly using the
`INSERT` statement. For details, refer to:
* [Hive Catalog](./catalogs/hive-catalog.mdx)
+
* [Iceberg Catalog](./catalogs/iceberg-catalog.mdx)
+
* [JDBC Catalog](./catalogs/jdbc-catalog-overview.md)
-## Refreshing Catalog
+## Refreshing Data Catalogs
-Catalogs created in Doris act as "proxy" to access the metadata services of
corresponding data sources. Doris caches some metadata to improve access
performance and reduce frequent cross-network requests. However, the cache has
a validity period, and without refreshing, you cannot access the latest
metadata. Therefore, Doris provides several ways to refresh catalogs.
+Data catalogs created in Doris act as "proxies" to access the metadata service
of the corresponding data source. Doris caches some metadata. Caching can
improve metadata access performance and avoid frequent cross-network requests.
However, caching also has timeliness issues — if the cache is not refreshed,
the latest metadata cannot be accessed. Therefore, Doris provides multiple ways
to refresh data catalogs.
```sql
-- Refresh catalog
@@ -231,13 +322,13 @@ REFRESH DATABASE catalog_name.db_name;
REFRESH TABLE catalog_name.db_name.table_name;
```
-Doris also supports disabling metadata caching to access the latest metadata
in real-time.
+Doris also supports disabling metadata caching to enable real-time access to
the latest metadata.
For detailed information and configuration of metadata caching, please refer
to: [Metadata Cache](./meta-cache.md)
-## Modifying Catalog
+## Modifying Data Catalogs
-You can modify the properties or name of a catalog using the `ALTER CATALOG`
statement:
+You can modify the properties or name of a data catalog using `ALTER CATALOG`:
```sql
-- Rename a catalog
@@ -250,19 +341,16 @@ ALTER CATALOG iceberg_catalog SET PROPERTIES ('key1' =
'value1' [, 'key' = 'valu
ALTER CATALOG iceberg_catalog MODIFY COMMENT 'my iceberg catalog';
```
-## Deleting Catalog
+## Dropping Data Catalogs
-You can delete a specified external catalog using the `DROP CATALOG` statement.
+You can drop a specified external data catalog using `DROP CATALOG`.
```sql
DROP CATALOG [IF EXISTS] iceberg_catalog;
```
-Deleting an external catalog from Doris does not remove the actual data; it
only deletes the mapping relationship stored in Doris.
+Dropping an external data catalog from Doris does not delete the actual data.
It only removes the data catalog mapping stored in Doris.
## Permission Management
-The permission management for databases and tables in an external catalog is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
-
-
-
+Permission management for databases and tables in external data catalogs is
the same as for internal tables. For details, refer to the [Authentication and
Authorization](../admin-manual/auth/authentication-and-authorization.md)
documentation.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]