yihua commented on a change in pull request #926: [HUDI-278] Translate
Administering page
URL: https://github.com/apache/incubator-hudi/pull/926#discussion_r328912072
##########
File path: docs/admin_guide.cn.md
##########
@@ -374,71 +365,69 @@ Compaction successfully repaired
```
-## Metrics {#metrics}
+## 指标 {#metrics}
-Once the Hudi Client is configured with the right datasetname and environment
for metrics, it produces the following graphite metrics, that aid in debugging
hudi datasets
+为Hudi Client配置正确的数据集名称和指标环境后,它将生成以下graphite指标,以帮助调试hudi数据集
- - **Commit Duration** - This is amount of time it took to successfully commit
a batch of records
- - **Rollback Duration** - Similarly, amount of time taken to undo partial
data left over by a failed commit (happens everytime automatically after a
failing write)
- - **File Level metrics** - Shows the amount of new files added, versions,
deleted (cleaned) in each commit
- - **Record Level Metrics** - Total records inserted/updated etc per commit
- - **Partition Level metrics** - number of partitions upserted (super useful
to understand sudden spikes in commit duration)
+ - **提交持续时间** - 这是成功提交一批记录所花费的时间
+ - **回滚持续时间** - 同样,撤消失败的提交所剩余的部分数据所花费的时间(每次写入失败后都会自动发生)
+ - **文件级别指标** - 显示每次提交中新增、版本、删除(清除)的新文件数量
+ - **记录级别指标** - 每次提交插入/更新的记录总数
+ - **分区级别指标** - 更新的分区数量(对于了解提交持续时间的突然峰值非常有用)
-These metrics can then be plotted on a standard tool like grafana. Below is a
sample commit duration chart.
+然后可以将这些指标绘制在grafana等标准工具上。以下是提交持续时间图表示例。
<figure>
<img class="docimage" src="/images/hudi_commit_duration.png"
alt="hudi_commit_duration.png" style="max-width: 1000px" />
</figure>
-## Troubleshooting Failures {#troubleshooting}
-
-Section below generally aids in debugging Hudi failures. Off the bat, the
following metadata is added to every record to help triage issues easily using
standard Hadoop SQL engines (Hive/Presto/Spark)
-
- - **_hoodie_record_key** - Treated as a primary key within each DFS
partition, basis of all updates/inserts
- - **_hoodie_commit_time** - Last commit that touched this record
- - **_hoodie_file_name** - Actual file name containing the record (super
useful to triage duplicates)
- - **_hoodie_partition_path** - Path from basePath that identifies the
partition containing this record
+## 故障排除 {#troubleshooting}
-Note that as of now, Hudi assumes the application passes in the same
deterministic partitionpath for a given recordKey. i.e the uniqueness of record
key is only enforced within each partition
+以下部分通常有助于调试Hudi故障。将以下元数据添加到每条记录中,以帮助使用标准Hadoop SQL引擎(Hive/Presto/Spark)轻松分类问题。
+ - **_hoodie_record_key** - 作为每个DFS分区内的主键,是所有更新/插入的基础
+ - **_hoodie_commit_time** - 该记录上次的提交
+ - **_hoodie_file_name** - 包含记录的实际文件名(对分类重复非常有用)
+ - **_hoodie_partition_path** - basePath的路径,该路径标识包含此记录的分区
-#### Missing records
+请注意,到目前为止,Hudi假定应用程序为给定的recordKey传递相同的确定性分区路径。即每个分区内强制recordKey的唯一性。
-Please check if there were any write errors using the admin commands above,
during the window at which the record could have been written.
-If you do find errors, then the record was not actually written by Hudi, but
handed back to the application to decide what to do with it.
+#### 缺失记录
-#### Duplicates
+请在可以写入记录的窗口中,使用上面的admin命令检查是否存在任何写入错误。
+如果确实发现错误,那么记录实际上不是由Hudi写入的,而是交还给应用程序来决定如何处理。
-First of all, please confirm if you do indeed have duplicates **AFTER**
ensuring the query is accessing the Hudi datasets [properly](sql_queries.html) .
+#### 重复
- - If confirmed, please use the metadata fields above, to identify the
physical files & partition files containing the records .
- - If duplicates span files across partitionpath, then this means your
application is generating different partitionPaths for same recordKey, Please
fix your app
- - if duplicates span multiple files within the same partitionpath, please
engage with mailing list. This should not happen. You can use the `records
deduplicate` command to fix your data.
+首先,请确认是否确实存在重复**AFTER**,以确保查询可以[正确](sql_queries.html)访问Hudi数据集。
-#### Spark failures {#spark-ui}
+ - 如果确认,请使用上面的元数据字段来标识包含记录的物理文件和分区文件。
+ - 如果重复的文件跨越整个分区路径,则意味着您的应用程序正在为同一recordKey生成不同的分区路径,请修复您的应用程序.
+ - 如果重复跨越同一分区路径中的多个文件,请使用邮件列表。这不应该发生。您可以使用`records deduplicate`命令修复数据。
Review comment:
“如果重复跨越同一分区路径中的多个文件” => “如果重复的记录存在于同一分区路径下的多个文件”
“请使用邮件列表” => “...汇报这个问题”
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services