Jibing-Li commented on code in PR #21567:
URL: https://github.com/apache/doris/pull/21567#discussion_r1261925537
##########
docs/zh-CN/docs/lakehouse/multi-catalog/statistics.md:
##########
@@ -0,0 +1,234 @@
+---
+{
+ "title": "外表统计信息",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 外表统计信息
+
+外表统计信息的收集方式和收集内容与内表基本一致,详细信息可以参考[内表统计信息](../../query-acceleration/statistics.md)。目前支持对Hive,Iceberg和Hudi等外部表的收集。
+外表暂不支持的功能包括
+
+1. 暂不支持直方图收集
+2. 暂不支持分区的增量收集和更新
+3. 暂不支持自动收集(with auto),用户可以使用周期性收集(with period)来代替
+4. 暂不支持抽样收集
+
+下面主要介绍一下外表统计信息收集的示例和实现原理。
+
+## 使用示例
+
+由于Stats
Collector的工作对用户透明,用户在使用中无需关注。这里主要展示在Doris中通过执行analyze命令收集外表统计信息的相关示例。除了上文提到的外表暂不支持的4个功能,其余和内表使用方式相同。下面以hive.tpch100数据库为例进行展示。tpch100数据库中包含lineitem,orders,region等8张表。
+
+### 信息收集
+
+外表支持手动一次性收集和周期性收集两种收集方式。
+
+#### 手动一次性收集
+
+- 收集region表的表信息以及全部列的信息:
+```
+mysql> ANALYZE TABLE hive.tpch100.region;
++--------------+-------------------------+------------+--------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id |
++--------------+-------------------------+------------+--------------------------------+--------+
+| hive | default_cluster:tpch100 | region |
[r_regionkey,r_comment,r_name] | 124182 |
++--------------+-------------------------+------------+--------------------------------+--------+
+1 row in set (0.02 sec)
+```
+此操作是异步执行,会在后台创建收集任务,可以通过job_id查看任务进度。
+
+- 收集tpch100数据库所有表的信息
+
+```
+mysql> ANALYZE DATABASE hive.tpch100;
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| Catalog_Name | DB_Name | Table_Name | Columns
| Job_Id
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+| hive | tpch100 | partsupp |
[ps_suppkey,ps_availqty,ps_comment,ps_partkey,ps_supplycost]
| 124192 |
+| hive | tpch100 | orders |
[o_orderstatus,o_clerk,o_orderdate,o_shippriority,o_custkey,o_totalprice,o_orderkey,o_comment,o_orderpriority]
| 124199 |
+| hive | tpch100 | lineitem |
[l_returnflag,l_receiptdate,l_tax,l_shipmode,l_suppkey,l_shipdate,l_commitdate,l_partkey,l_orderkey,l_quantity,l_linestatus,l_comment,l_extendedprice,l_linenumber,l_discount,l_shipinstruct]
| 124210 |
+| hive | tpch100 | part |
[p_partkey,p_container,p_name,p_comment,p_brand,p_type,p_retailprice,p_mfgr,p_size]
| 124228 |
+| hive | tpch100 | customer |
[c_custkey,c_phone,c_acctbal,c_mktsegment,c_address,c_nationkey,c_name,c_comment]
| 124239 |
+| hive | tpch100 | supplier |
[s_comment,s_phone,s_nationkey,s_name,s_address,s_acctbal,s_suppkey]
| 124249 |
+| hive | tpch100 | nation |
[n_comment,n_nationkey,n_regionkey,n_name]
| 124258 |
+| hive | tpch100 | region | [r_regionkey,r_comment,r_name]
| 124264
|
++--------------+---------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
+8 rows in set (0.29 sec)
+```
+此操作会批量提交tpch100数据库下所有表的收集任务,也是异步执行,会给每个表创建一个job_id,也可以通过job_id查看每张表的任务进度。
+
+- 同步收集
+
+可以使用with sync同步收集表或数据库的统计信息。这时不会创建后台任务,客户端在收集完成之前会block住,直到收集任务执行完成再返回。
Review Comment:
Updated the doc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]