[GitHub] [incubator-hudi] yihua commented on a change in pull request #1006: [HUDI-276] Translate the Configurations page into Chinese

GitBox Mon, 11 Nov 2019 10:55:44 -0800

yihua commented on a change in pull request #1006: [HUDI-276] Translate the 
Configurations page into Chinese
URL: https://github.com/apache/incubator-hudi/pull/1006#discussion_r344858247


 ##########
 File path: docs/configurations.cn.md
 ##########
 @@ -51,385 +49,419 @@ inputDF.write()
 .save(basePath);
 ```
 
-Options useful for writing datasets via `write.format.option(...)`
+用于通过`write.format.option(...)`写入数据集的选项
 
 ##### TABLE_NAME_OPT_KEY {#TABLE_NAME_OPT_KEY}
-  Property: `hoodie.datasource.write.table.name` [Required]<br/>
-  <span style="color:grey">Hive table name, to register the dataset 
into.</span>
+  属性：`hoodie.datasource.write.table.name` [必须]<br/>
+  <span style="color:grey">Hive表名，用于将数据集注册到其中。</span>
   
 ##### OPERATION_OPT_KEY {#OPERATION_OPT_KEY}
-  Property: `hoodie.datasource.write.operation`, Default: `upsert`<br/>
-  <span style="color:grey">whether to do upsert, insert or bulkinsert for the 
write operation. Use `bulkinsert` to load new data into a table, and there on 
use `upsert`/`insert`. 
-  bulk insert uses a disk based write path to scale to load large inputs 
without need to cache it.</span>
+  属性：`hoodie.datasource.write.operation`, 默认值：`upsert`<br/>
+  <span 
style="color:grey">是否为写操作进行插入更新、插入或批量插入。使用`bulkinsert`将新数据加载到表中，之后使用`upsert`或`insert`。
+  批量插入使用基于磁盘的写入路径来扩展以加载大量输入，而无需对其进行缓存。</span>
   
 ##### STORAGE_TYPE_OPT_KEY {#STORAGE_TYPE_OPT_KEY}
-  Property: `hoodie.datasource.write.storage.type`, Default: `COPY_ON_WRITE` 
<br/>
-  <span style="color:grey">The storage type for the underlying data, for this 
write. This can't change between writes.</span>
+  属性：`hoodie.datasource.write.storage.type`, 默认值：`COPY_ON_WRITE` <br/>
+  <span style="color:grey">此写入的基础数据的存储类型。两次写入之间不能改变。</span>
   
 ##### PRECOMBINE_FIELD_OPT_KEY {#PRECOMBINE_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.precombine.field`, Default: `ts` <br/>
-  <span style="color:grey">Field used in preCombining before actual write. 
When two records have the same key value,
-we will pick the one with the largest value for the precombine field, 
determined by Object.compareTo(..)</span>
+  属性：`hoodie.datasource.write.precombine.field`, 默认值：`ts` <br/>
+  <span style="color:grey">实际写入之前在preCombining中使用的字段。
+  当两个记录具有相同的键值时，我们将使用Object.compareTo(..)从precombine字段中选择一个值最大的记录。</span>
 
 ##### PAYLOAD_CLASS_OPT_KEY {#PAYLOAD_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.payload.class`, Default: 
`org.apache.hudi.OverwriteWithLatestAvroPayload` <br/>
-  <span style="color:grey">Payload class used. Override this, if you like to 
roll your own merge logic, when upserting/inserting. 
-  This will render any value set for `PRECOMBINE_FIELD_OPT_VAL` 
in-effective</span>
+  属性：`hoodie.datasource.write.payload.class`, 
默认值：`org.apache.hudi.OverwriteWithLatestAvroPayload` <br/>
+  <span style="color:grey">使用的有效载荷类。如果您想在插入更新或插入时使用自己的合并逻辑，请重写此方法。
+  这将使为`PRECOMBINE_FIELD_OPT_VAL`设置的任何值无效</span>
   
 ##### RECORDKEY_FIELD_OPT_KEY {#RECORDKEY_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.recordkey.field`, Default: `uuid` <br/>
-  <span style="color:grey">Record key field. Value to be used as the 
`recordKey` component of `HoodieKey`. Actual value
-will be obtained by invoking .toString() on the field value. Nested fields can 
be specified using
-the dot notation eg: `a.b.c`</span>
+  属性：`hoodie.datasource.write.recordkey.field`, 默认值：`uuid` <br/>
+  <span style="color:grey">记录键字段。用作`HoodieKey`中`recordKey`部分的值。
+  实际值将通过在字段值上调用.toString()来获得。可以使用点符号指定嵌套字段，例如：`a.b.c`</span>
 
 ##### PARTITIONPATH_FIELD_OPT_KEY {#PARTITIONPATH_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.partitionpath.field`, Default: 
`partitionpath` <br/>
-  <span style="color:grey">Partition path field. Value to be used at the 
`partitionPath` component of `HoodieKey`.
-Actual value ontained by invoking .toString()</span>
+  属性：`hoodie.datasource.write.partitionpath.field`, 默认值：`partitionpath` <br/>
+  <span style="color:grey">分区路径字段。用作`HoodieKey`中`partitionPath`部分的值。
+  通过调用.toString()获得实际的值</span>
 
 ##### KEYGENERATOR_CLASS_OPT_KEY {#KEYGENERATOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.keygenerator.class`, Default: 
`org.apache.hudi.SimpleKeyGenerator` <br/>
-  <span style="color:grey">Key generator class, that implements will extract 
the key out of incoming `Row` object</span>
+  属性：`hoodie.datasource.write.keygenerator.class`, 
默认值：`org.apache.hudi.SimpleKeyGenerator` <br/>
+  <span style="color:grey">键生成器类，实现从输入的`Row`对象中提取键</span>
   
 ##### COMMIT_METADATA_KEYPREFIX_OPT_KEY {#COMMIT_METADATA_KEYPREFIX_OPT_KEY}
-  Property: `hoodie.datasource.write.commitmeta.key.prefix`, Default: `_` <br/>
-  <span style="color:grey">Option keys beginning with this prefix, are 
automatically added to the commit/deltacommit metadata.
-This is useful to store checkpointing information, in a consistent way with 
the hudi timeline</span>
+  属性：`hoodie.datasource.write.commitmeta.key.prefix`, 默认值：`_` <br/>
+  <span style="color:grey">以该前缀开头的选项键会自动添加到提交/增量提交的元数据中。
+  这对于以与hudi时间轴一致的方式存储检查点信息很有用</span>
 
 ##### INSERT_DROP_DUPS_OPT_KEY {#INSERT_DROP_DUPS_OPT_KEY}
-  Property: `hoodie.datasource.write.insert.drop.duplicates`, Default: `false` 
<br/>
-  <span style="color:grey">If set to true, filters out all duplicate records 
from incoming dataframe, during insert operations. </span>
+  属性：`hoodie.datasource.write.insert.drop.duplicates`, 默认值：`false` <br/>
+  <span style="color:grey">如果设置为true，则在插入操作期间从传入数据帧中过滤掉所有重复记录。</span>
   
 ##### HIVE_SYNC_ENABLED_OPT_KEY {#HIVE_SYNC_ENABLED_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.enable`, Default: `false` <br/>
-  <span style="color:grey">When set to true, register/sync the dataset to 
Apache Hive metastore</span>
+  属性：`hoodie.datasource.hive_sync.enable`, 默认值：`false` <br/>
+  <span style="color:grey">设置为true时，将数据集注册并同步到Apache Hive Metastore</span>
   
 ##### HIVE_DATABASE_OPT_KEY {#HIVE_DATABASE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.database`, Default: `default` <br/>
-  <span style="color:grey">database to sync to</span>
+  属性：`hoodie.datasource.hive_sync.database`, 默认值：`default` <br/>
+  <span style="color:grey">要同步到的数据库</span>
   
 ##### HIVE_TABLE_OPT_KEY {#HIVE_TABLE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.table`, [Required] <br/>
-  <span style="color:grey">table to sync to</span>
+  属性：`hoodie.datasource.hive_sync.table`, [Required] <br/>
+  <span style="color:grey">要同步到的表</span>
   
 ##### HIVE_USER_OPT_KEY {#HIVE_USER_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.username`, Default: `hive` <br/>
-  <span style="color:grey">hive user name to use</span>
+  属性：`hoodie.datasource.hive_sync.username`, 默认值：`hive` <br/>
+  <span style="color:grey">要使用的Hive用户名</span>
   
 ##### HIVE_PASS_OPT_KEY {#HIVE_PASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.password`, Default: `hive` <br/>
-  <span style="color:grey">hive password to use</span>
+  属性：`hoodie.datasource.hive_sync.password`, 默认值：`hive` <br/>
+  <span style="color:grey">要使用的Hive密码</span>
   
 ##### HIVE_URL_OPT_KEY {#HIVE_URL_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.jdbcurl`, Default: 
`jdbc:hive2://localhost:10000` <br/>
+  属性：`hoodie.datasource.hive_sync.jdbcurl`, 默认值：`jdbc:hive2://localhost:10000` 
<br/>
   <span style="color:grey">Hive metastore url</span>
   
 ##### HIVE_PARTITION_FIELDS_OPT_KEY {#HIVE_PARTITION_FIELDS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_fields`, Default: ` ` <br/>
-  <span style="color:grey">field in the dataset to use for determining hive 
partition columns.</span>
+  属性：`hoodie.datasource.hive_sync.partition_fields`, 默认值：` ` <br/>
+  <span style="color:grey">数据集中用于确定Hive分区的字段。</span>
   
 ##### HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY 
{#HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_extractor_class`, Default: 
`org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor` <br/>
-  <span style="color:grey">Class used to extract partition field values into 
hive partition columns.</span>
+  属性：`hoodie.datasource.hive_sync.partition_extractor_class`, 
默认值：`org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor` <br/>
+  <span style="color:grey">用于将分区字段值提取到配置单元分区列中的类。</span>
   
 ##### HIVE_ASSUME_DATE_PARTITION_OPT_KEY {#HIVE_ASSUME_DATE_PARTITION_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.assume_date_partitioning`, Default: 
`false` <br/>
-  <span style="color:grey">Assume partitioning is yyyy/mm/dd</span>
+  属性：`hoodie.datasource.hive_sync.assume_date_partitioning`, 默认值：`false` <br/>
+  <span style="color:grey">假设分区格式是yyyy/mm/dd</span>
 
-#### Read Options
+#### 读选项
 
-Options useful for reading datasets via `read.format.option(...)`
+用于通过`read.format.option(...)`读取数据集的选项
 
 ##### VIEW_TYPE_OPT_KEY {#VIEW_TYPE_OPT_KEY}
-Property: `hoodie.datasource.view.type`, Default: `read_optimized` <br/>
-<span style="color:grey">Whether data needs to be read, in incremental mode 
(new data since an instantTime)
-(or) Read Optimized mode (obtain latest view, based on columnar data)
-(or) Real time mode (obtain latest view, based on row & columnar data)</span>
+属性：`hoodie.datasource.view.type`, 默认值：`read_optimized` <br/>
+<span style="color:grey">是否需要以某种模式读取数据，增量模式（自InstantTime以来的新数据）
+（或）读优化模式（基于列数据获取最新视图）
+（或）实时模式（基于行和列数据获取最新视图）</span>
 
 ##### BEGIN_INSTANTTIME_OPT_KEY {#BEGIN_INSTANTTIME_OPT_KEY} 
-Property: `hoodie.datasource.read.begin.instanttime`, [Required in incremental 
mode] <br/>
-<span style="color:grey">Instant time to start incrementally pulling data 
from. The instanttime here need not
-necessarily correspond to an instant on the timeline. New data written with an
- `instant_time > BEGIN_INSTANTTIME` are fetched out. For e.g: '20170901080000' 
will get
- all new data written after Sep 1, 2017 08:00AM.</span>
+属性：`hoodie.datasource.read.begin.instanttime`, [在增量模式下必须] <br/>
+<span style="color:grey">开始增量提取数据的即时时间。这里的instanttime不必一定与时间轴上的即时相对应。
+取出以`instant_time > BEGIN_INSTANTTIME`写入的新数据。
+例如：'20170901080000'将获取2017年9月1日08:00 AM之后写入的所有新数据。</span>
  
 ##### END_INSTANTTIME_OPT_KEY {#END_INSTANTTIME_OPT_KEY}
-Property: `hoodie.datasource.read.end.instanttime`, Default: latest instant 
(i.e fetches all new data since begin instant time) <br/>
-<span style="color:grey"> Instant time to limit incrementally fetched data to. 
New data written with an
-`instant_time <= END_INSTANTTIME` are fetched out.</span>
+属性：`hoodie.datasource.read.end.instanttime`, 默认值：latest instant (i.e fetches 
all new data since begin instant time) <br/>
+<span style="color:grey">将增量提取的数据限制到的即时时间。取出以`instant_time <= 
END_INSTANTTIME`写入的新数据。</span>
 
 
-### WriteClient Configs {#writeclient-configs}
+### WriteClient 配置 {#writeclient-configs}
 
-Jobs programming directly against the RDD level apis can build a 
`HoodieWriteConfig` object and pass it in to the `HoodieWriteClient` 
constructor. 
-HoodieWriteConfig can be built using a builder pattern as below. 
+直接使用RDD级别api进行编程的Jobs可以构建一个`HoodieWriteConfig`对象，并将其传递给`HoodieWriteClient`构造函数。
+HoodieWriteConfig可以使用以下构建器模式构建。
 
 ```
 HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
         .withPath(basePath)
         .forTable(tableName)
         .withSchema(schemaStr)
-        .withProps(props) // pass raw k,v pairs from a property file.
+        .withProps(props) // 从属性文件传递原始k、v对。
         
.withCompactionConfig(HoodieCompactionConfig.newBuilder().withXXX(...).build())
         .withIndexConfig(HoodieIndexConfig.newBuilder().withXXX(...).build())
         ...
         .build();
 ```
 
-Following subsections go over different aspects of write configs, explaining 
most important configs with their property names, default values.
+以下各节介绍了写配置的不同方面，并解释了最重要的配置及其属性名称和默认值。
 
 ##### withPath(hoodie_base_path) {#withPath}
-Property: `hoodie.base.path` [Required] <br/>
-<span style="color:grey">Base DFS path under which all the data partitions are 
created. Always prefix it explicitly with the storage scheme (e.g hdfs://, 
s3:// etc). Hudi stores all the main meta-data about commits, savepoints, 
cleaning audit logs etc in .hoodie directory under the base directory. </span>
+属性：`hoodie.base.path` [必须] <br/>
+<span style="color:grey">创建所有数据分区所依据的基本DFS路径。
+始终在前缀中明确指明存储方式（例如hdfs://，s3://等）。
+Hudi将有关提交、保存点、清理审核日志等的所有主要元数据存储在基本目录下的.hoodie目录中。</span>
 
 ##### withSchema(schema_str) {#withSchema} 
-Property: `hoodie.avro.schema` [Required]<br/>
-<span style="color:grey">This is the current reader avro schema for the 
dataset. This is a string of the entire schema. HoodieWriteClient uses this 
schema to pass on to implementations of HoodieRecordPayload to convert from the 
source format to avro record. This is also used when re-writing records during 
an update. </span>
+属性：`hoodie.avro.schema` [必须]<br/>
+<span style="color:grey">这是数据集的当前读取器的avro模式（schema）。
+这是整个模式的字符串。HoodieWriteClient使用此模式传递到HoodieRecordPayload的实现，以从源格式转换为avro记录。
+在更新过程中重写记录时也使用此模式。</span>
 
 ##### forTable(table_name) {#forTable} 
-Property: `hoodie.table.name` [Required] <br/>
- <span style="color:grey">Table name for the dataset, will be used for 
registering with Hive. Needs to be same across runs.</span>
+属性：`hoodie.table.name` [必须] <br/>
+ <span style="color:grey">数据集的表名，将用于在Hive中注册。每次运行需要相同。</span>
 
 ##### withBulkInsertParallelism(bulk_insert_parallelism = 1500) 
{#withBulkInsertParallelism} 
-Property: `hoodie.bulkinsert.shuffle.parallelism`<br/>
-<span style="color:grey">Bulk insert is meant to be used for large initial 
imports and this parallelism determines the initial number of files in your 
dataset. Tune this to achieve a desired optimal size during initial 
import.</span>
+属性：`hoodie.bulkinsert.shuffle.parallelism`<br/>
+<span style="color:grey">批量插入旨在用于较大的初始导入，而此处的并行度决定了数据集中文件的初始数量。
+调整此值以达到在初始导入期间所需的最佳尺寸。</span>
 
 ##### withParallelism(insert_shuffle_parallelism = 1500, 
upsert_shuffle_parallelism = 1500) {#withParallelism} 
-Property: `hoodie.insert.shuffle.parallelism`, 
`hoodie.upsert.shuffle.parallelism`<br/>
-<span style="color:grey">Once data has been initially imported, this 
parallelism controls initial parallelism for reading input records. Ensure this 
value is high enough say: 1 partition for 1 GB of input data</span>
+属性：`hoodie.insert.shuffle.parallelism`, 
`hoodie.upsert.shuffle.parallelism`<br/>
+<span style="color:grey">最初导入数据后，此并行度将控制用于读取输入记录的初始并行度。
+确保此值足够高，例如：1个分区用于1 GB的输入数据</span>
 
 ##### combineInput(on_insert = false, on_update=true) {#combineInput} 
-Property: `hoodie.combine.before.insert`, `hoodie.combine.before.upsert`<br/>
-<span style="color:grey">Flag which first combines the input RDD and merges 
multiple partial records into a single record before inserting or updating in 
DFS</span>
+属性：`hoodie.combine.before.insert`, `hoodie.combine.before.upsert`<br/>
+<span style="color:grey">在DFS中插入或更新之前先组合输入RDD并将多个部分记录合并为单个记录的标志</span>
 
 ##### withWriteStatusStorageLevel(level = MEMORY_AND_DISK_SER) 
{#withWriteStatusStorageLevel} 
-Property: `hoodie.write.status.storage.level`<br/>
-<span style="color:grey">HoodieWriteClient.insert and HoodieWriteClient.upsert 
returns a persisted RDD[WriteStatus], this is because the Client can choose to 
inspect the WriteStatus and choose and commit or not based on the failures. 
This is a configuration for the storage level for this RDD </span>
+属性：`hoodie.write.status.storage.level`<br/>
+<span 
style="color:grey">HoodieWriteClient.insert和HoodieWriteClient.upsert返回一个持久的RDD[WriteStatus]，
+这是因为客户端可以选择检查WriteStatus并根据失败选择是否提交。这是此RDD的存储级别的配置</span>
 
 ##### withAutoCommit(autoCommit = true) {#withAutoCommit} 
-Property: `hoodie.auto.commit`<br/>
-<span style="color:grey">Should HoodieWriteClient autoCommit after insert and 
upsert. The client can choose to turn off auto-commit and commit on a "defined 
success condition"</span>
+属性：`hoodie.auto.commit`<br/>
+<span style="color:grey">插入和插入更新后，HoodieWriteClient是否应该自动提交。
+客户端可以选择关闭自动提交，并在"定义的成功条件"下提交</span>
 
 ##### withAssumeDatePartitioning(assumeDatePartitioning = false) 
{#withAssumeDatePartitioning} 
-Property: `hoodie.assume.date.partitioning`<br/>
-<span style="color:grey">Should HoodieWriteClient assume the data is 
partitioned by dates, i.e three levels from base path. This is a stop-gap to 
support tables created by versions < 0.3.1. Will be removed eventually </span>
+属性：`hoodie.assume.date.partitioning`<br/>
+<span style="color:grey">HoodieWriteClient是否应该假设数据按日期划分，即从基本路径划分为三个级别。
+这是支持<0.3.1版本创建的表的一个补丁。最终将被删除</span>
 
 ##### withConsistencyCheckEnabled(enabled = false) 
{#withConsistencyCheckEnabled} 
-Property: `hoodie.consistency.check.enabled`<br/>
-<span style="color:grey">Should HoodieWriteClient perform additional checks to 
ensure written files' are listable on the underlying filesystem/storage. Set 
this to true, to workaround S3's eventual consistency model and ensure all data 
written as a part of a commit is faithfully available for queries. </span>
+属性：`hoodie.consistency.check.enabled`<br/>
+<span style="color:grey">HoodieWriteClient是否应该执行其他检查，以确保写入的文件在基础文件系统/存储上可列出。
+将其设置为true可以解决S3的最终一致性模型，并确保作为提交的一部分写入的所有数据均能忠实地用于查询。</span>
 
 Review comment:
   有道理

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] yihua commented on a change in pull request #1006: [HUDI-276] Translate the Configurations page into Chinese

Reply via email to