Re: [PR] doc: update loader doc [incubator-hugegraph-doc]

via GitHub Thu, 22 Jan 2026 01:55:10 -0800


imbajin commented on code in PR #445:
URL: 
https://github.com/apache/incubator-hugegraph-doc/pull/445#discussion_r2716149862



##########
content/cn/docs/quickstart/toolchain/hugegraph-loader.md:
##########
@@ -807,47 +807,71 @@ schema: 必填
 
 ##### 3.4.1 参数说明
 
-| 参数                        | 默认值       | 是否必传 | 描述信息                          
                                    |
-|---------------------------|-----------|------|-------------------------------------------------------------------|
-| `-f` 或 `--file`           |           | Y    | 配置脚本的路径                       
                                    |
-| `-g` 或 `--graph`          |           | Y    | 图名称                           
                                |
-| `-gs` 或 `--graphspace`    | DEFAULT   |      | 图空间                           
                                 |
-| `-s` 或 `--schema`         |           | Y    | schema 文件路径                   
                                    |
-| `-h` 或 `--host` 或 `-i`   | localhost |      | HugeGraphServer 的地址            
                                   |
-| `-p` 或 `--port`           | 8080      |      | HugeGraphServer 的端口号          
                                    |
-| `--username`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 username                          |
-| `--password`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 password                          |
-| `--create-graph`          | false     |      | 是否在图不存在时自动创建                  
                            |
-| `--token`                 | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 token                             |
-| `--protocol`              | http      |      | 向服务端发请求的协议，可选 http 或 https    
                                    |
-| `--pd-peers`              |           |      | PD 服务节点地址                     
                                  |
-| `--pd-token`              |           |      | 访问 PD 服务的 token               
                                   |
-| `--meta-endpoints`        |           |      | 元信息存储服务地址                     
                                |
-| `--direct`                | false     |      | 是否直连 HugeGraph-Store          
                                    |
-| `--route-type`            | NODE_PORT |      | 路由选择方式（可选值：NODE_PORT / DDS / 
BOTH）                               |
-| `--cluster`               | hg        |      | 集群名                           
                                    |
-| `--trust-store-file`      |           |      | 请求协议为 https 时，客户端的证书文件路径      
                                    |
-| `--trust-store-password`  |           |      | 请求协议为 https 时，客户端证书密码         
                                    |
-| `--clear-all-data`        | false     |      | 导入数据前是否清除服务端的原有数据             
                                    |
-| `--clear-timeout`         | 240       |      | 导入数据前清除服务端的原有数据的超时时间          
                                    |
-| `--incremental-mode`      | false     |      | 是否使用断点续导模式，仅输入源为 FILE 和 HDFS 
支持该模式，启用该模式能从上一次导入停止的地方开始导入           |
-| `--failure-mode`          | false     |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
-| `--batch-insert-threads`  | CPUs      |      | 批量插入线程池大小 (CPUs 是当前 OS 
可用**逻辑核**个数)                             |
-| `--single-insert-threads` | 8         |      | 单条插入线程池的大小                    
                                    |
-| `--max-conn`              | 4 * CPUs  |      | HugeClient 与 HugeGraphServer 
的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
-| `--max-conn-per-route`    | 2 * CPUs  |      | HugeClient 与 HugeGraphServer 
每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
-| `--batch-size`            | 500       |      | 导入数据时每个批次包含的数据条数              
                                    |
-| `--max-parse-errors`      | 1         |      | 最多允许多少行数据解析错误，达到该值则程序退出       
                                    |
-| `--max-insert-errors`     | 500       |      | 最多允许多少行数据插入错误，达到该值则程序退出       
                                    |
-| `--timeout`               | 60        |      | 插入结果返回的超时时间（秒）                
                                    |
-| `--shutdown-timeout`      | 10        |      | 多线程停止的等待时间（秒）                 
                                    |
-| `--retry-times`           | 0         |      | 发生特定异常时的重试次数                  
                                    |
-| `--retry-interval`        | 10        |      | 重试之前的间隔时间（秒）                  
                                    |
-| `--check-vertex`          | false     |      | 插入边时是否检查边所连接的顶点是否存在           
                                    |
-| `--print-progress`        | true      |      | 是否在控制台实时打印导入条数                
                                    |
-| `--dry-run`               | false     |      | 打开该模式，只解析不导入，通常用于测试           
                                    |
-| `--help`                  | false     |      | 打印帮助信息                        
                                    |                                           
       
-
+| 参数                                      | 默认值         | 是否必传 | 描述信息          
                                                    |
+|-----------------------------------------|-------------|------|-------------------------------------------------------------------|
+| `-f` 或 `--file`                         |             | Y    | 配置脚本的路径       
                                                    |
+| `-g` 或 `--graph`                        |             | Y    | 图名称           
                                                    |
+| `--graphspace`                          | DEFAULT     |      | 图空间           
                                                    |
+| `-s` 或 `--schema`                       |             | Y    | schema 文件路径   
                                                    |
+| `-h` 或 `--host` 或 `-i`                  | localhost   |      | 
HugeGraphServer 的地址                                               |
+| `-p` 或 `--port`                         | 8080        |      | 
HugeGraphServer 的端口号                                              |
+| `--username`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 username                          |
+| `--password`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 password                          |
+| `--create-graph`                        | false       |      | 是否在图不存在时自动创建  
                                                    |
+| `--token`                               | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 token                             |
+| `--protocol`                            | http        |      | 向服务端发请求的协议，可选 
http 或 https                                        |
+| `--pd-peers`                            |             |      | PD 服务节点地址     
                                                    |
+| `--pd-token`                            |             |      | 访问 PD 服务的 
token                                                   |
+| `--meta-endpoints`                      |             |      | 元信息存储服务地址     
                                                    |
+| `--direct`                              | false       |      | 是否直连 
HugeGraph-Store                                              |
+| `--route-type`                          | NODE_PORT   |      | 
路由选择方式（可选值：NODE_PORT / DDS / BOTH）                                |
+| `--cluster`                             | hg          |      | 集群名           
                                                    |
+| `--trust-store-file`                    |             |      | 请求协议为 https 
时，客户端的证书文件路径                                          |
+| `--trust-store-password`                |             |      | 请求协议为 https 
时，客户端证书密码                                             |
+| `--clear-all-data`                      | false       |      | 
导入数据前是否清除服务端的原有数据                                                 |
+| `--clear-timeout`                       | 240         |      | 
导入数据前清除服务端的原有数据的超时时间                                              |
+| `--incremental-mode`                    | false       |      | 
是否使用断点续导模式，仅输入源为 FILE 和 HDFS 支持该模式，启用该模式能从上一次导入停止的地方开始导入          |
+| `--failure-mode`                        | false       |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
+| `--batch-insert-threads`                | CPUs        |      | 批量插入线程池大小 
(CPUs 是当前 OS 可用**逻辑核**个数)                               |
+| `--single-insert-threads`               | 8           |      | 单条插入线程池的大小    
                                                    |
+| `--max-conn`                            | 4 * CPUs    |      | HugeClient 与 
HugeGraphServer 的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
+| `--max-conn-per-route`                  | 2 * CPUs    |      | HugeClient 与 
HugeGraphServer 每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
+| `--batch-size`                          | 500         |      | 
导入数据时每个批次包含的数据条数                                                  |
+| `--max-parse-errors`                    | 1           |      | 
最多允许多少行数据解析错误，达到该值则程序退出                                           |
+| `--max-insert-errors`                   | 500         |      | 
最多允许多少行数据插入错误，达到该值则程序退出                                           |
+| `--timeout`                             | 60          |      | 
插入结果返回的超时时间（秒）                                                    |
+| `--shutdown-timeout`                    | 10          |      | 多线程停止的等待时间（秒） 
                                                    |
+| `--retry-times`                         | 0           |      | 发生特定异常时的重试次数  
                                                    |
+| `--retry-interval`                      | 10          |      | 重试之前的间隔时间（秒）  
                                                    |
+| `--check-vertex`                        | false       |      | 
插入边时是否检查边所连接的顶点是否存在                                               |
+| `--print-progress`                      | true        |      | 
是否在控制台实时打印导入条数                                                    |
+| `--dry-run`                             | false       |      | 
打开该模式，只解析不导入，通常用于测试                                               |
+| `--help` 或 `-help`                      | false       |      | 打印帮助信息        
                                                    |                           
                       
+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | 并行读取数据文件最大线程数 
                                        |
+| `--start-file`                          | 0           |      | 
用于部分（分片）导入的起始文件索引                                 |
+| `--end-file`                            | -1          |      | 用于部分导入的截止文件索引 
                                        |
+| `--scatter-sources`                     | false       |      | 
分散（并行）读取多个数据源以优化 I/O 性能                           |

Review Comment:
   ⚠️ **参数描述不够清晰 - 易用性问题**
   
   `--scatter-sources` 的描述"分散(并行)读取多个数据源以优化 I/O 性能"对于新用户来说不够直观。
   
   建议补充:
   1. 什么场景下应该启用此参数?
   2. 与 `--parallel-count` 的区别和配合使用方式
   3. 对性能的具体影响(例如:"适用于多个大数据文件同时导入的场景")



##########
content/cn/docs/quickstart/toolchain/hugegraph-loader.md:
##########
@@ -807,47 +807,71 @@ schema: 必填
 
 ##### 3.4.1 参数说明
 
-| 参数                        | 默认值       | 是否必传 | 描述信息                          
                                    |
-|---------------------------|-----------|------|-------------------------------------------------------------------|
-| `-f` 或 `--file`           |           | Y    | 配置脚本的路径                       
                                    |
-| `-g` 或 `--graph`          |           | Y    | 图名称                           
                                |
-| `-gs` 或 `--graphspace`    | DEFAULT   |      | 图空间                           
                                 |
-| `-s` 或 `--schema`         |           | Y    | schema 文件路径                   
                                    |
-| `-h` 或 `--host` 或 `-i`   | localhost |      | HugeGraphServer 的地址            
                                   |
-| `-p` 或 `--port`           | 8080      |      | HugeGraphServer 的端口号          
                                    |
-| `--username`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 username                          |
-| `--password`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 password                          |
-| `--create-graph`          | false     |      | 是否在图不存在时自动创建                  
                            |
-| `--token`                 | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 token                             |
-| `--protocol`              | http      |      | 向服务端发请求的协议，可选 http 或 https    
                                    |
-| `--pd-peers`              |           |      | PD 服务节点地址                     
                                  |
-| `--pd-token`              |           |      | 访问 PD 服务的 token               
                                   |
-| `--meta-endpoints`        |           |      | 元信息存储服务地址                     
                                |
-| `--direct`                | false     |      | 是否直连 HugeGraph-Store          
                                    |
-| `--route-type`            | NODE_PORT |      | 路由选择方式（可选值：NODE_PORT / DDS / 
BOTH）                               |
-| `--cluster`               | hg        |      | 集群名                           
                                    |
-| `--trust-store-file`      |           |      | 请求协议为 https 时，客户端的证书文件路径      
                                    |
-| `--trust-store-password`  |           |      | 请求协议为 https 时，客户端证书密码         
                                    |
-| `--clear-all-data`        | false     |      | 导入数据前是否清除服务端的原有数据             
                                    |
-| `--clear-timeout`         | 240       |      | 导入数据前清除服务端的原有数据的超时时间          
                                    |
-| `--incremental-mode`      | false     |      | 是否使用断点续导模式，仅输入源为 FILE 和 HDFS 
支持该模式，启用该模式能从上一次导入停止的地方开始导入           |
-| `--failure-mode`          | false     |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
-| `--batch-insert-threads`  | CPUs      |      | 批量插入线程池大小 (CPUs 是当前 OS 
可用**逻辑核**个数)                             |
-| `--single-insert-threads` | 8         |      | 单条插入线程池的大小                    
                                    |
-| `--max-conn`              | 4 * CPUs  |      | HugeClient 与 HugeGraphServer 
的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
-| `--max-conn-per-route`    | 2 * CPUs  |      | HugeClient 与 HugeGraphServer 
每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
-| `--batch-size`            | 500       |      | 导入数据时每个批次包含的数据条数              
                                    |
-| `--max-parse-errors`      | 1         |      | 最多允许多少行数据解析错误，达到该值则程序退出       
                                    |
-| `--max-insert-errors`     | 500       |      | 最多允许多少行数据插入错误，达到该值则程序退出       
                                    |
-| `--timeout`               | 60        |      | 插入结果返回的超时时间（秒）                
                                    |
-| `--shutdown-timeout`      | 10        |      | 多线程停止的等待时间（秒）                 
                                    |
-| `--retry-times`           | 0         |      | 发生特定异常时的重试次数                  
                                    |
-| `--retry-interval`        | 10        |      | 重试之前的间隔时间（秒）                  
                                    |
-| `--check-vertex`          | false     |      | 插入边时是否检查边所连接的顶点是否存在           
                                    |
-| `--print-progress`        | true      |      | 是否在控制台实时打印导入条数                
                                    |
-| `--dry-run`               | false     |      | 打开该模式，只解析不导入，通常用于测试           
                                    |
-| `--help`                  | false     |      | 打印帮助信息                        
                                    |                                           
       
-
+| 参数                                      | 默认值         | 是否必传 | 描述信息          
                                                    |
+|-----------------------------------------|-------------|------|-------------------------------------------------------------------|
+| `-f` 或 `--file`                         |             | Y    | 配置脚本的路径       
                                                    |
+| `-g` 或 `--graph`                        |             | Y    | 图名称           
                                                    |
+| `--graphspace`                          | DEFAULT     |      | 图空间           
                                                    |
+| `-s` 或 `--schema`                       |             | Y    | schema 文件路径   
                                                    |
+| `-h` 或 `--host` 或 `-i`                  | localhost   |      | 
HugeGraphServer 的地址                                               |
+| `-p` 或 `--port`                         | 8080        |      | 
HugeGraphServer 的端口号                                              |
+| `--username`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 username                          |
+| `--password`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 password                          |
+| `--create-graph`                        | false       |      | 是否在图不存在时自动创建  
                                                    |
+| `--token`                               | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 token                             |
+| `--protocol`                            | http        |      | 向服务端发请求的协议，可选 
http 或 https                                        |
+| `--pd-peers`                            |             |      | PD 服务节点地址     
                                                    |
+| `--pd-token`                            |             |      | 访问 PD 服务的 
token                                                   |
+| `--meta-endpoints`                      |             |      | 元信息存储服务地址     
                                                    |
+| `--direct`                              | false       |      | 是否直连 
HugeGraph-Store                                              |
+| `--route-type`                          | NODE_PORT   |      | 
路由选择方式（可选值：NODE_PORT / DDS / BOTH）                                |
+| `--cluster`                             | hg          |      | 集群名           
                                                    |
+| `--trust-store-file`                    |             |      | 请求协议为 https 
时，客户端的证书文件路径                                          |
+| `--trust-store-password`                |             |      | 请求协议为 https 
时，客户端证书密码                                             |
+| `--clear-all-data`                      | false       |      | 
导入数据前是否清除服务端的原有数据                                                 |
+| `--clear-timeout`                       | 240         |      | 
导入数据前清除服务端的原有数据的超时时间                                              |
+| `--incremental-mode`                    | false       |      | 
是否使用断点续导模式，仅输入源为 FILE 和 HDFS 支持该模式，启用该模式能从上一次导入停止的地方开始导入          |
+| `--failure-mode`                        | false       |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
+| `--batch-insert-threads`                | CPUs        |      | 批量插入线程池大小 
(CPUs 是当前 OS 可用**逻辑核**个数)                               |
+| `--single-insert-threads`               | 8           |      | 单条插入线程池的大小    
                                                    |
+| `--max-conn`                            | 4 * CPUs    |      | HugeClient 与 
HugeGraphServer 的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
+| `--max-conn-per-route`                  | 2 * CPUs    |      | HugeClient 与 
HugeGraphServer 每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
+| `--batch-size`                          | 500         |      | 
导入数据时每个批次包含的数据条数                                                  |
+| `--max-parse-errors`                    | 1           |      | 
最多允许多少行数据解析错误，达到该值则程序退出                                           |
+| `--max-insert-errors`                   | 500         |      | 
最多允许多少行数据插入错误，达到该值则程序退出                                           |
+| `--timeout`                             | 60          |      | 
插入结果返回的超时时间（秒）                                                    |
+| `--shutdown-timeout`                    | 10          |      | 多线程停止的等待时间（秒） 
                                                    |
+| `--retry-times`                         | 0           |      | 发生特定异常时的重试次数  
                                                    |
+| `--retry-interval`                      | 10          |      | 重试之前的间隔时间（秒）  
                                                    |
+| `--check-vertex`                        | false       |      | 
插入边时是否检查边所连接的顶点是否存在                                               |
+| `--print-progress`                      | true        |      | 
是否在控制台实时打印导入条数                                                    |
+| `--dry-run`                             | false       |      | 
打开该模式，只解析不导入，通常用于测试                                               |
+| `--help` 或 `-help`                      | false       |      | 打印帮助信息        
                                                    |                           
                       
+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | 并行读取数据文件最大线程数 
                                        |
+| `--start-file`                          | 0           |      | 
用于部分（分片）导入的起始文件索引                                 |
+| `--end-file`                            | -1          |      | 用于部分导入的截止文件索引 
                                        |
+| `--scatter-sources`                     | false       |      | 
分散（并行）读取多个数据源以优化 I/O 性能                           |
+| `--cdc-flush-interval`                  | 30000       |      | Flink CDC 
的数据刷新间隔                                           |
+| `--cdc-sink-parallelism`                | 1           |      | Flink CDC 
写入端（Sink）的并行度                                   |
+| `--max-read-errors`                     | 1           |      | 
程序退出前允许的最大读取错误行数                                   |
+| `--max-read-lines`                      | -1L         |      | 
最大读取行数限制；一旦达到此行数，导入任务将停止                   |
+| `--test-mode`                           | false       |      | 是否开启测试模式      
                                             |
+| `--use-prefilter`                       | false       |      | 是否预先过滤顶点      
                                             |
+| `--short-id`                            |             |      | 将自定义 ID 
映射为更短的 ID                                        |
+| `--vertex-edge-limit`                   | -1L         |      | 单个顶点的最大边数限制   
                                          |
+| `--sink-type`                           | true        |      | 
数据接收端（Sink）存储类型开关                                     |

Review Comment:
   ⚠️ **参数描述模糊 - 易用性问题**
   
   `--sink-type` 的默认值为 `true`,但描述"数据接收端(Sink)存储类型开关"让用户不清楚:
   1. `true` 代表什么存储类型?
   2. `false` 又代表什么?
   3. 可选的存储类型有哪些?
   
   建议改进描述,例如:"是否使用 [存储类型A],false 时使用 [存储类型B]"或列举所有可选存储类型。



##########
content/cn/docs/quickstart/toolchain/hugegraph-loader.md:
##########
@@ -807,47 +807,71 @@ schema: 必填
 
 ##### 3.4.1 参数说明
 
-| 参数                        | 默认值       | 是否必传 | 描述信息                          
                                    |
-|---------------------------|-----------|------|-------------------------------------------------------------------|
-| `-f` 或 `--file`           |           | Y    | 配置脚本的路径                       
                                    |
-| `-g` 或 `--graph`          |           | Y    | 图名称                           
                                |
-| `-gs` 或 `--graphspace`    | DEFAULT   |      | 图空间                           
                                 |
-| `-s` 或 `--schema`         |           | Y    | schema 文件路径                   
                                    |
-| `-h` 或 `--host` 或 `-i`   | localhost |      | HugeGraphServer 的地址            
                                   |
-| `-p` 或 `--port`           | 8080      |      | HugeGraphServer 的端口号          
                                    |
-| `--username`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 username                          |
-| `--password`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 password                          |
-| `--create-graph`          | false     |      | 是否在图不存在时自动创建                  
                            |
-| `--token`                 | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 token                             |
-| `--protocol`              | http      |      | 向服务端发请求的协议，可选 http 或 https    
                                    |
-| `--pd-peers`              |           |      | PD 服务节点地址                     
                                  |
-| `--pd-token`              |           |      | 访问 PD 服务的 token               
                                   |
-| `--meta-endpoints`        |           |      | 元信息存储服务地址                     
                                |
-| `--direct`                | false     |      | 是否直连 HugeGraph-Store          
                                    |
-| `--route-type`            | NODE_PORT |      | 路由选择方式（可选值：NODE_PORT / DDS / 
BOTH）                               |
-| `--cluster`               | hg        |      | 集群名                           
                                    |
-| `--trust-store-file`      |           |      | 请求协议为 https 时，客户端的证书文件路径      
                                    |
-| `--trust-store-password`  |           |      | 请求协议为 https 时，客户端证书密码         
                                    |
-| `--clear-all-data`        | false     |      | 导入数据前是否清除服务端的原有数据             
                                    |
-| `--clear-timeout`         | 240       |      | 导入数据前清除服务端的原有数据的超时时间          
                                    |
-| `--incremental-mode`      | false     |      | 是否使用断点续导模式，仅输入源为 FILE 和 HDFS 
支持该模式，启用该模式能从上一次导入停止的地方开始导入           |
-| `--failure-mode`          | false     |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
-| `--batch-insert-threads`  | CPUs      |      | 批量插入线程池大小 (CPUs 是当前 OS 
可用**逻辑核**个数)                             |
-| `--single-insert-threads` | 8         |      | 单条插入线程池的大小                    
                                    |
-| `--max-conn`              | 4 * CPUs  |      | HugeClient 与 HugeGraphServer 
的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
-| `--max-conn-per-route`    | 2 * CPUs  |      | HugeClient 与 HugeGraphServer 
每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
-| `--batch-size`            | 500       |      | 导入数据时每个批次包含的数据条数              
                                    |
-| `--max-parse-errors`      | 1         |      | 最多允许多少行数据解析错误，达到该值则程序退出       
                                    |
-| `--max-insert-errors`     | 500       |      | 最多允许多少行数据插入错误，达到该值则程序退出       
                                    |
-| `--timeout`               | 60        |      | 插入结果返回的超时时间（秒）                
                                    |
-| `--shutdown-timeout`      | 10        |      | 多线程停止的等待时间（秒）                 
                                    |
-| `--retry-times`           | 0         |      | 发生特定异常时的重试次数                  
                                    |
-| `--retry-interval`        | 10        |      | 重试之前的间隔时间（秒）                  
                                    |
-| `--check-vertex`          | false     |      | 插入边时是否检查边所连接的顶点是否存在           
                                    |
-| `--print-progress`        | true      |      | 是否在控制台实时打印导入条数                
                                    |
-| `--dry-run`               | false     |      | 打开该模式，只解析不导入，通常用于测试           
                                    |
-| `--help`                  | false     |      | 打印帮助信息                        
                                    |                                           
       
-
+| 参数                                      | 默认值         | 是否必传 | 描述信息          
                                                    |
+|-----------------------------------------|-------------|------|-------------------------------------------------------------------|
+| `-f` 或 `--file`                         |             | Y    | 配置脚本的路径       
                                                    |
+| `-g` 或 `--graph`                        |             | Y    | 图名称           
                                                    |
+| `--graphspace`                          | DEFAULT     |      | 图空间           
                                                    |
+| `-s` 或 `--schema`                       |             | Y    | schema 文件路径   
                                                    |
+| `-h` 或 `--host` 或 `-i`                  | localhost   |      | 
HugeGraphServer 的地址                                               |
+| `-p` 或 `--port`                         | 8080        |      | 
HugeGraphServer 的端口号                                              |
+| `--username`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 username                          |
+| `--password`                            | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 password                          |
+| `--create-graph`                        | false       |      | 是否在图不存在时自动创建  
                                                    |
+| `--token`                               | null        |      | 当 
HugeGraphServer 开启了权限认证时，当前图的 token                             |
+| `--protocol`                            | http        |      | 向服务端发请求的协议，可选 
http 或 https                                        |
+| `--pd-peers`                            |             |      | PD 服务节点地址     
                                                    |
+| `--pd-token`                            |             |      | 访问 PD 服务的 
token                                                   |
+| `--meta-endpoints`                      |             |      | 元信息存储服务地址     
                                                    |
+| `--direct`                              | false       |      | 是否直连 
HugeGraph-Store                                              |
+| `--route-type`                          | NODE_PORT   |      | 
路由选择方式（可选值：NODE_PORT / DDS / BOTH）                                |
+| `--cluster`                             | hg          |      | 集群名           
                                                    |
+| `--trust-store-file`                    |             |      | 请求协议为 https 
时，客户端的证书文件路径                                          |
+| `--trust-store-password`                |             |      | 请求协议为 https 
时，客户端证书密码                                             |
+| `--clear-all-data`                      | false       |      | 
导入数据前是否清除服务端的原有数据                                                 |
+| `--clear-timeout`                       | 240         |      | 
导入数据前清除服务端的原有数据的超时时间                                              |
+| `--incremental-mode`                    | false       |      | 
是否使用断点续导模式，仅输入源为 FILE 和 HDFS 支持该模式，启用该模式能从上一次导入停止的地方开始导入          |
+| `--failure-mode`                        | false       |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
+| `--batch-insert-threads`                | CPUs        |      | 批量插入线程池大小 
(CPUs 是当前 OS 可用**逻辑核**个数)                               |
+| `--single-insert-threads`               | 8           |      | 单条插入线程池的大小    
                                                    |
+| `--max-conn`                            | 4 * CPUs    |      | HugeClient 与 
HugeGraphServer 的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
+| `--max-conn-per-route`                  | 2 * CPUs    |      | HugeClient 与 
HugeGraphServer 每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
+| `--batch-size`                          | 500         |      | 
导入数据时每个批次包含的数据条数                                                  |
+| `--max-parse-errors`                    | 1           |      | 
最多允许多少行数据解析错误，达到该值则程序退出                                           |
+| `--max-insert-errors`                   | 500         |      | 
最多允许多少行数据插入错误，达到该值则程序退出                                           |
+| `--timeout`                             | 60          |      | 
插入结果返回的超时时间（秒）                                                    |
+| `--shutdown-timeout`                    | 10          |      | 多线程停止的等待时间（秒） 
                                                    |
+| `--retry-times`                         | 0           |      | 发生特定异常时的重试次数  
                                                    |
+| `--retry-interval`                      | 10          |      | 重试之前的间隔时间（秒）  
                                                    |
+| `--check-vertex`                        | false       |      | 
插入边时是否检查边所连接的顶点是否存在                                               |
+| `--print-progress`                      | true        |      | 
是否在控制台实时打印导入条数                                                    |
+| `--dry-run`                             | false       |      | 
打开该模式，只解析不导入，通常用于测试                                               |
+| `--help` 或 `-help`                      | false       |      | 打印帮助信息        
                                                    |                           
                       
+| `--parallel-count` 或 `--parallel-count` | max(2,CPUS) |      | 并行读取数据文件最大线程数 
                                        |
+| `--start-file`                          | 0           |      | 
用于部分（分片）导入的起始文件索引                                 |
+| `--end-file`                            | -1          |      | 用于部分导入的截止文件索引 
                                        |
+| `--scatter-sources`                     | false       |      | 
分散（并行）读取多个数据源以优化 I/O 性能                           |
+| `--cdc-flush-interval`                  | 30000       |      | Flink CDC 
的数据刷新间隔                                           |
+| `--cdc-sink-parallelism`                | 1           |      | Flink CDC 
写入端（Sink）的并行度                                   |
+| `--max-read-errors`                     | 1           |      | 
程序退出前允许的最大读取错误行数                                   |
+| `--max-read-lines`                      | -1L         |      | 
最大读取行数限制；一旦达到此行数，导入任务将停止                   |
+| `--test-mode`                           | false       |      | 是否开启测试模式      
                                             |
+| `--use-prefilter`                       | false       |      | 是否预先过滤顶点      
                                             |
+| `--short-id`                            |             |      | 将自定义 ID 
映射为更短的 ID                                        |
+| `--vertex-edge-limit`                   | -1L         |      | 单个顶点的最大边数限制   
                                          |
+| `--sink-type`                           | true        |      | 
数据接收端（Sink）存储类型开关                                     |
+| `--vertex-partitions`                   | 64          |      | HBase 
顶点表的预分区数量                                           |

Review Comment:
   ⚠️ **缺少使用场景说明 - 易用性问题**
   
   新增的 HBase 相关参数(第 863-869 行):
   - `--vertex-partitions`
   - `--edge-partitions`
   - `--vertex-table-name`
   - `--edge-table-name`
   - `--hbase-zk-quorum`
   - `--hbase-zk-port`
   - `--hbase-zk-parent`
   
   建议在参数表格前或"3.4.1 参数说明"章节开头补充:
   1. 这些参数适用于 HBase 后端存储场景
   2. 需要配合 `--backend` 参数使用
   3. 或者增加一个独立小节专门说明不同后端存储的参数配置



##########
content/cn/docs/quickstart/toolchain/hugegraph-loader.md:
##########
@@ -807,47 +807,71 @@ schema: 必填
 
 ##### 3.4.1 参数说明
 
-| 参数                        | 默认值       | 是否必传 | 描述信息                          
                                    |
-|---------------------------|-----------|------|-------------------------------------------------------------------|
-| `-f` 或 `--file`           |           | Y    | 配置脚本的路径                       
                                    |
-| `-g` 或 `--graph`          |           | Y    | 图名称                           
                                |
-| `-gs` 或 `--graphspace`    | DEFAULT   |      | 图空间                           
                                 |
-| `-s` 或 `--schema`         |           | Y    | schema 文件路径                   
                                    |
-| `-h` 或 `--host` 或 `-i`   | localhost |      | HugeGraphServer 的地址            
                                   |
-| `-p` 或 `--port`           | 8080      |      | HugeGraphServer 的端口号          
                                    |
-| `--username`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 username                          |
-| `--password`              | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 password                          |
-| `--create-graph`          | false     |      | 是否在图不存在时自动创建                  
                            |
-| `--token`                 | null      |      | 当 HugeGraphServer 
开启了权限认证时，当前图的 token                             |
-| `--protocol`              | http      |      | 向服务端发请求的协议，可选 http 或 https    
                                    |
-| `--pd-peers`              |           |      | PD 服务节点地址                     
                                  |
-| `--pd-token`              |           |      | 访问 PD 服务的 token               
                                   |
-| `--meta-endpoints`        |           |      | 元信息存储服务地址                     
                                |
-| `--direct`                | false     |      | 是否直连 HugeGraph-Store          
                                    |
-| `--route-type`            | NODE_PORT |      | 路由选择方式（可选值：NODE_PORT / DDS / 
BOTH）                               |
-| `--cluster`               | hg        |      | 集群名                           
                                    |
-| `--trust-store-file`      |           |      | 请求协议为 https 时，客户端的证书文件路径      
                                    |
-| `--trust-store-password`  |           |      | 请求协议为 https 时，客户端证书密码         
                                    |
-| `--clear-all-data`        | false     |      | 导入数据前是否清除服务端的原有数据             
                                    |
-| `--clear-timeout`         | 240       |      | 导入数据前清除服务端的原有数据的超时时间          
                                    |
-| `--incremental-mode`      | false     |      | 是否使用断点续导模式，仅输入源为 FILE 和 HDFS 
支持该模式，启用该模式能从上一次导入停止的地方开始导入           |
-| `--failure-mode`          | false     |      | 失败模式为 true 
时，会导入之前失败了的数据，一般来说失败数据文件需要在人工更正编辑好后，再次进行导入             |
-| `--batch-insert-threads`  | CPUs      |      | 批量插入线程池大小 (CPUs 是当前 OS 
可用**逻辑核**个数)                             |
-| `--single-insert-threads` | 8         |      | 单条插入线程池的大小                    
                                    |
-| `--max-conn`              | 4 * CPUs  |      | HugeClient 与 HugeGraphServer 
的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项     |
-| `--max-conn-per-route`    | 2 * CPUs  |      | HugeClient 与 HugeGraphServer 
每个路由的最大 HTTP 连接数，**调整线程**的时候建议同时调整此项 |
-| `--batch-size`            | 500       |      | 导入数据时每个批次包含的数据条数              
                                    |
-| `--max-parse-errors`      | 1         |      | 最多允许多少行数据解析错误，达到该值则程序退出       
                                    |
-| `--max-insert-errors`     | 500       |      | 最多允许多少行数据插入错误，达到该值则程序退出       
                                    |
-| `--timeout`               | 60        |      | 插入结果返回的超时时间（秒）                
                                    |
-| `--shutdown-timeout`      | 10        |      | 多线程停止的等待时间（秒）                 
                                    |
-| `--retry-times`           | 0         |      | 发生特定异常时的重试次数                  
                                    |
-| `--retry-interval`        | 10        |      | 重试之前的间隔时间（秒）                  
                                    |
-| `--check-vertex`          | false     |      | 插入边时是否检查边所连接的顶点是否存在           
                                    |
-| `--print-progress`        | true      |      | 是否在控制台实时打印导入条数                
                                    |
-| `--dry-run`               | false     |      | 打开该模式，只解析不导入，通常用于测试           
                                    |
-| `--help`                  | false     |      | 打印帮助信息                        
                                    |                                           
       
-
+| 参数                                      | 默认值         | 是否必传 | 描述信息          
                                                    |

Review Comment:
   🧹 **参数分组建议 - 提升易用性**
   
   当前参数表格包含 49 个参数,对新用户来说信息量过大。建议按功能分组,例如:
   
   **基础参数** (必填/常用):
   - `-f`, `-g`, `-s`, `-h`, `-p` 等
   
   **性能调优参数**:
   - `--batch-insert-threads`, `--max-conn`, `--batch-size`, `--parallel-count` 
等
   
   **高级特性参数**:
   - `--incremental-mode`, `--failure-mode`, `--dry-run` 等
   
   **HBase 后端参数**:
   - `--vertex-partitions`, `--hbase-zk-quorum` 等
   
   **CDC 相关参数**:
   - `--cdc-flush-interval`, `--cdc-sink-parallelism` 等
   
   这样可以帮助用户快速找到需要的参数。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] doc: update loader doc [incubator-hugegraph-doc]

Reply via email to