leesf commented on a change in pull request #1006: [HUDI-276] Translate the Configurations page into Chinese URL: https://github.com/apache/incubator-hudi/pull/1006#discussion_r344466101
########## File path: docs/configurations.cn.md ########## @@ -1,48 +1,46 @@ --- -title: Configurations +title: 配置 keywords: garbage collection, hudi, jvm, configs, tuning sidebar: mydoc_sidebar permalink: configurations.html toc: true -summary: "Here we list all possible configurations and what they mean" +summary: 在这里,我们列出了所有可能的配置及其含义。 --- -This page covers the different ways of configuring your job to write/read Hudi datasets. -At a high level, you can control behaviour at few levels. - -- **[Spark Datasource Configs](#spark-datasource)** : These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing view type to read. -- **[WriteClient Configs](#writeclient-configs)** : Internally, the Hudi datasource uses a RDD based `HoodieWriteClient` api to actually perform writes to storage. These configs provide deep control over lower level aspects like - file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads. -- **[RecordPayload Config](#PAYLOAD_CLASS_OPT_KEY)** : This is the lowest level of customization offered by Hudi. Record payloads define how to produce new values to upsert based on incoming new record and - stored old record. Hudi provides default implementations such as `OverwriteWithLatestAvroPayload` which simply update storage with the latest/last-written record. - This can be overridden to a custom class extending `HoodieRecordPayload` class, on both datasource and WriteClient levels. +该页面介绍了几种配置写入或读取Hudi数据集的作业的方法。 +简而言之,您可以在几个级别上控制行为。 + +- **[Spark Datasource 配置](#spark-datasource)** : 这些配置控制Hudi Spark Datasource,提供如下功能: + 定义键和分区、选择写操作、指定如何合并记录或选择要读取的视图类型。 +- **[WriteClient 配置](#writeclient-configs)** : 在内部,Hudi数据源使用基于RDD的`HoodieWriteClient` API + 真正执行对存储的写入。 这些配置可对文件大小、压缩(compression)、并行性、压缩(compaction)、写入模式、清理等底层方面进行完全控制。 + 尽管Hudi提供了合理的默认设置,但在不同情形下,可能需要对这些配置进行调整以针对特定的工作负载进行优化。 +- **[RecordPayload 配置](#PAYLOAD_CLASS_OPT_KEY)** : 这是Hudi提供的最底层的定制。 + RecordPayload定义了如何根据传入的新记录和存储的旧记录来产生新值以进行插入更新。 + Hudi提供了诸如`OverwriteWithLatestAvroPayload`的默认实现,该实现仅使用最新或最后写入的记录来更新存储。 + 在数据源和WriteClient级别,都可以将其重写为扩展`HoodieRecordPayload`类的自定义类。 -### Talking to Cloud Storage +### 与云存储连接 -Immaterial of whether RDD/WriteClient APIs or Datasource is used, the following information helps configure access -to cloud stores. +无论使用RDD/WriteClient API还是Datasource,以下信息有助于配置对云存储的访问。 Review comment: 有助于 -> 都有助于?读取来感觉更通顺 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
