[GitHub] [incubator-hudi] leesf commented on a change in pull request #958: [HUDI-273] Translate the Writing Data page into Chinese documentation

GitBox Tue, 15 Oct 2019 18:41:04 -0700

leesf commented on a change in pull request #958: [HUDI-273] Translate the 
Writing Data page into Chinese documentation
URL: https://github.com/apache/incubator-hudi/pull/958#discussion_r335245974


 ##########
 File path: docs/writing_data.cn.md
 ##########
 @@ -182,39 +182,41 @@ Usage: <main class> [options]
        Hive username
 ```
 
-## Deletes 
+## 删除数据 
 
-Hudi supports implementing two types of deletes on data stored in Hudi 
datasets, by enabling the user to specify a different record payload 
implementation. 
+通过允许用户指定不同的数据记录负载实现，Hudi支持对存储在Hudi数据集中的数据执行两种类型的删除。
 
- - **Soft Deletes** : With soft deletes, user wants to retain the key but just 
null out the values for all other fields. 
- This can be simply achieved by ensuring the appropriate fields are nullable 
in the dataset schema and simply upserting the dataset after setting these 
fields to null.
- - **Hard Deletes** : A stronger form of delete is to physically remove any 
trace of the record from the dataset. This can be achieved by issuing an upsert 
with a custom payload implementation
- via either DataSource or DeltaStreamer which always returns Optional.Empty as 
the combined value. Hudi ships with a built-in 
`org.apache.hudi.EmptyHoodieRecordPayload` class that does exactly this.
+ - **Soft Deletes（软删除）** ：使用软删除时，用户希望保留键，但仅使所有其他字段的值都为空。
+ 通过确保适当的字段在数据集模式中可以为空，并在将这些字段设置为null之后直接向数据集索引归并这些记录，即可轻松实现这一点。
+ - **Hard Deletes（硬删除）** ：这种更强形式的删除是从数据集中彻底删除记录在存储上的任何痕迹。 
+ 
这可以通过触发一个带有自定义负载实现的索引归并来实现，这种实现可以使用总是返回Optional.Empty作为组合值的DataSource或DeltaStreamer。
 
+ Hudi附带了一个内置的`org.apache.hudi.EmptyHoodieRecordPayload`类，它就是实现了这一功能。
  
 ```
- deleteDF // dataframe containing just records to be deleted
+ deleteDF // 仅包含要删除的记录的数据帧
    .write().format("org.apache.hudi")
-   .option(...) // Add HUDI options like record-key, partition-path and others 
as needed for your setup
-   // specify record_key, partition_key, precombine_fieldkey & usual params
+   .option(...) // 根据设置需要添加HUDI参数，例如记录键、分区路径和其他参数
+   // 指定record_key，partition_key，precombine_fieldkey和常规参数
    .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY, 
"org.apache.hudi.EmptyHoodieRecordPayload")
  
 ```
 
-
-## Storage Management
-
-Hudi also performs several key storage management functions on the data stored 
in a Hudi dataset. A key aspect of storing data on DFS is managing file sizes 
and counts
-and reclaiming storage space. For e.g HDFS is infamous for its handling of 
small files, which exerts memory/RPC pressure on the Name Node and can 
potentially destabilize
-the entire cluster. In general, query engines provide much better performance 
on adequately sized columnar files, since they can effectively amortize cost of 
obtaining 
-column statistics etc. Even on some cloud data stores, there is often cost to 
listing directories with large number of small files.
-
-Here are some ways to efficiently manage the storage of your Hudi datasets.
-
- - The [small file handling 
feature](configurations.html#compactionSmallFileSize) in Hudi, profiles 
incoming workload 
-   and distributes inserts to existing file groups instead of creating new 
file groups, which can lead to small files. 
- - Cleaner can be [configured](configurations.html#retainCommits) to clean up 
older file slices, more or less aggressively depending on maximum time for 
queries to run & lookback needed for incremental pull
- - User can also tune the size of the [base/parquet 
file](configurations.html#limitFileSize), [log 
files](configurations.html#logFileMaxSize) & expected [compression 
ratio](configurations.html#parquetCompressionRatio), 
-   such that sufficient number of inserts are grouped into the same file 
group, resulting in well sized base files ultimately.
- - Intelligently tuning the [bulk insert 
parallelism](configurations.html#withBulkInsertParallelism), can again in 
nicely sized initial file groups. It is in fact critical to get this right, 
since the file groups
-   once created cannot be deleted, but simply expanded as explained before.
- - For workloads with heavy updates, the [merge-on-read 
storage](concepts.html#merge-on-read-storage) provides a nice mechanism for 
ingesting quickly into smaller files and then later merging them into larger 
base files via compaction.
+## 存储管理
+
+Hudi还对存储在Hudi数据集中的数据执行几个关键的存储管理功能。在DFS上存储数据的关键方面是管理文件大小和数量以及回收存储空间。 
+例如，HDFS在处理小文件上性能很差，这会对名称节点的内存及RPC施加很大的压力，并可能破坏整个集群的稳定性。
 
 Review comment:
   名称节点 -> Name Node, could be better to not translate?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #958: [HUDI-273] Translate the Writing Data page into Chinese documentation

Reply via email to