morningman commented on code in PR #21567:
URL: https://github.com/apache/doris/pull/21567#discussion_r1261346895
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
Review Comment:
段落之间留空行
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
Review Comment:
Need add to the sidebar
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
+```
+mysql> analyze table hive.tpch100.orders with sync;
+Query OK, 0 rows affected (33.19 sec)
+```
+
+#### 周期性收集
+
+使用with period可以设置周期性的执行收集任务:
+analyze table hive.tpch100.orders with period 86400;
Review Comment:
```suggestion
`analyze table hive.tpch100.orders with period 86400;`
```
段落之间留空行
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
Review Comment:
这里给出如何使用 job id 查看进度的示例。虽然后面有,但是这里给出,用户看起来比较连贯
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
Review Comment:
这里受到 query_timeout 的限制么?如果是,需要说明,在analyze大表时,需要增大超时时间。
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
+```
+mysql> analyze table hive.tpch100.orders with sync;
+Query OK, 0 rows affected (33.19 sec)
+```
+
+#### 周期性收集
+
+使用with period可以设置周期性的执行收集任务:
+analyze table hive.tpch100.orders with period 86400;
+这条语句创建一个周期性收集的任务,周期是1天,每天自动收集和更新orders表的统计信。
+
+### 任务管理
+
+任务管理的方式也和内表相同,主要包括查看job,查看task,删除job等功能。请参考[内表统计信息](../../query-acceleration/statistics.md)任务管理部分。
+
+- 查看所有job状态
+
+```
+mysql> SHOW ANALYZE;
++--------+--------------+-------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+---------+----------------------+----------+---------------+
+| job_id | catalog_name | db_name | tbl_name | col_name
| job_type | analysis_type | message | last_exec_time_in_ms |
state | schedule_type |
++--------+--------------+-------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+---------+----------------------+----------+---------------+
+| 12152 | hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| MANUAL | FUNDAMENTALS | | 2023-07-11
15:57:16 | FINISHED | ONCE |
+| 12159 | hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| MANUAL | FUNDAMENTALS | | 2023-07-11 15:57:24 | FINISHED | ONCE
|
++--------+--------------+-------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+---------+----------------------+----------+---------------+
+```
+
+- 查看一个job的所有task状态
+
+```
+mysql> SHOW ANALYZE TASK STATUS 12152;
++---------+--------------------------------------------------------------+---------+----------------------+----------+
+| task_id | col_name |
message | last_exec_time_in_ms | state |
++---------+--------------------------------------------------------------+---------+----------------------+----------+
+| 12153 | ps_availqty |
| 2023-07-11 15:56:58 | FINISHED |
+| 12154 | ps_suppkey |
| 2023-07-11 15:56:57 | FINISHED |
+| 12155 | ps_comment |
| 2023-07-11 15:57:16 | FINISHED |
+| 12156 | ps_supplycost |
| 2023-07-11 15:56:57 | FINISHED |
+| 12157 | ps_partkey |
| 2023-07-11 15:56:58 | FINISHED |
+| 12158 | [ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost] |
| 2023-07-11 15:56:57 | FINISHED |
++---------+--------------------------------------------------------------+---------+----------------------+----------+
+```
+
+- 终止未完成的job
+
+```
+KILL ANALYZE [job_id]
+```
+
+- 删除周期性收集job
+
+```
+DROP ANALYZE JOB [JOB_ID]
+```
+
+### 信息查看
+
+信息的查看包括表的统计信息(表的行数)查看和列统计信息查看,请参考[内表统计信息](../../query-acceleration/statistics.md)查看统计信息部分。
+
+#### 表统计信息
+
+```
+mysql> SHOW TABLE STATS hive.tpch100.orders;
++-----------+---------------------+---------------------+
+| row_count | update_time | last_analyze_time |
++-----------+---------------------+---------------------+
+| 150000000 | 2023-07-11 23:01:49 | 2023-07-11 23:01:44 |
++-----------+---------------------+---------------------+
+```
+
+#### 列统计信息
+```
+SHOW COLUMN [cached] hive.tpch100.orders;
+```
+
+查看statistics表中指定table的列统计信息,如果指定cached参数,则展示的是指定表已加载到缓存中的列信息。
+
+```
+mysql> SHOW COLUMN stats hive.tpch100.orders;
++-----------------+-------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+| column_name | count | ndv | num_null | data_size |
avg_size_byte | min | max |
++-----------------+-------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+| o_orderstatus | 1.5E8 | 3.0 | 0.0 | 1.50000001E8 |
1.0 | 'F' | 'P' |
+| o_clerk | 1.5E8 | 100836.0 | 0.0 | 2.250000015E9 |
15.0 | 'Clerk#000000001' | 'Clerk#000100000' |
+| o_orderdate | 1.5E8 | 2417.0 | 0.0 | 6.00000004E8 |
4.0 | '1992-01-01' | '1998-08-02' |
+| o_shippriority | 1.5E8 | 1.0 | 0.0 | 6.00000004E8 |
4.0 | 0 | 0 |
+| o_custkey | 1.5E8 | 1.0023982E7 | 0.0 | 6.00000004E8 |
4.0 | 1 | 14999999 |
+| o_totalprice | 1.5E8 | 3.4424096E7 | 0.0 | 1.200000008E9 |
8.0 | 811.73 | 591036.15 |
+| o_orderkey | 1.5E8 | 1.51621184E8 | 0.0 | 1.200000008E9 |
8.0 | 1 | 600000000 |
+| o_comment | 1.5E8 | 1.10204136E8 | 0.0 | 7.275038757500258E9 |
48.50025806 | ' Tiresias about the' | 'zzle? unusual requests w' |
+| o_orderpriority | 1.5E8 | 5.0 | 0.0 | 1.2600248124001656E9 |
8.40016536 | '1-URGENT' | '5-LOW' |
++-----------------+-------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+```
+
+### 信息修改
+
+修改信息支持用户手动修改列统计信息。可以修改指定列的row_count, ndv, num_nulls, min_value, max_value,
data_size等信息。
+请参考[内表统计信息](../../query-acceleration/statistics.md)修改统计信息部分。
+
+```
+mysql> ALTER TABLE hive.tpch100.orders MODIFY COLUMN o_orderstatus SET STATS
('row_count'='6001215');
+Query OK, 0 rows affected (0.03 sec)
+
+mysql> SHOW COLUMN stats hive.tpch100.orders;
++-----------------+-----------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+| column_name | count | ndv | num_null | data_size
| avg_size_byte | min | max |
++-----------------+-----------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+| o_orderstatus | 6001215.0 | 0.0 | 0.0 | 0.0
| 0.0 | 'NULL' | 'NULL' |
+| o_clerk | 1.5E8 | 100836.0 | 0.0 | 2.250000015E9
| 15.0 | 'Clerk#000000001' | 'Clerk#000100000' |
+| o_orderdate | 1.5E8 | 2417.0 | 0.0 | 6.00000004E8
| 4.0 | '1992-01-01' | '1998-08-02' |
+| o_shippriority | 1.5E8 | 1.0 | 0.0 | 6.00000004E8
| 4.0 | 0 | 0 |
+| o_custkey | 1.5E8 | 1.0023982E7 | 0.0 | 6.00000004E8
| 4.0 | 1 | 14999999 |
+| o_totalprice | 1.5E8 | 3.4424096E7 | 0.0 | 1.200000008E9
| 8.0 | 811.73 | 591036.15 |
+| o_orderkey | 1.5E8 | 1.51621184E8 | 0.0 | 1.200000008E9
| 8.0 | 1 | 600000000 |
+| o_comment | 1.5E8 | 1.10204136E8 | 0.0 | 7.275038757500258E9
| 48.50025806 | ' Tiresias about the' | 'zzle? unusual requests w' |
+| o_orderpriority | 1.5E8 | 5.0 | 0.0 | 1.2600248124001656E9
| 8.40016536 | '1-URGENT' | '5-LOW' |
++-----------------+-----------+--------------+----------+----------------------+---------------+-----------------------+----------------------------+
+```
+
+### 信息删除
+
+删除外表统计信息支持用户删除一张表的表行数信息列统计信息。如果用户指定了删除的列名,则只删除这些列的信息。如果不指定,则删除整张表所有列的统计信息以及表行数信息。
+请参考[内表统计信息](../../query-acceleration/statistics.md)删除统计信息部分。
+
+- 删除整张表的信息
+
+```
+DROP STATS hive.tpch100.orders
Review Comment:
```suggestion
DROP STATS hive.tpch100.orders
```
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
Review Comment:
段落之间留空行
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
Review Comment:
这里给出如何使用 job id 查看进度的示例。虽然后面有,但是这里给出,用户看起来比较连贯
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
Review Comment:
这里受到 query_timeout 的限制么?如果是,需要说明,在analyze大表时,需要增大超时时间。
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
+```
+mysql> analyze table hive.tpch100.orders with sync;
+Query OK, 0 rows affected (33.19 sec)
+```
+
+#### 周期性收集
+
+使用with period可以设置周期性的执行收集任务:
+analyze table hive.tpch100.orders with period 86400;
Review Comment:
```suggestion
`analyze table hive.tpch100.orders with period 86400;`
```
段落之间留空行
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]