yihua commented on a change in pull request #884: [HUDI-240] Translate Use 
Cases page
URL: https://github.com/apache/incubator-hudi/pull/884#discussion_r324035236

 File path: docs/use_cases.cn.md
 @@ -4,73 +4,65 @@ keywords: hudi, data ingestion, etl, real time, use cases
 sidebar: mydoc_sidebar
 permalink: use_cases.html
 toc: false
-summary: "Following are some sample use-cases for Hudi, which illustrate the 
benefits in terms of faster processing & increased efficiency"
+summary: "下面展示一些使用Hudi的示例,示例说明了加快处理速度和提高效率的好处"
-## Near Real-Time Ingestion
+## 近实时摄取
-Ingesting data from external sources like (event logs, databases, external 
sources) into a [Hadoop Data Lake](http://martinfowler.com/bliki/DataLake.html) 
is a well known problem.
-In most (if not all) Hadoop deployments, it is unfortunately solved in a 
piecemeal fashion, using a medley of ingestion tools,
-even though this data is arguably the most valuable for the entire 
-For RDBMS ingestion, Hudi provides __faster loads via Upserts__, as opposed 
costly & inefficient bulk loads. For e.g, you can read the MySQL BIN log or 
[Sqoop Incremental 
 and apply them to an
-equivalent Hudi table on DFS. This would be much faster/efficient than a [bulk 
-or [complicated handcrafted merge 
-For NoSQL datastores like [Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
[HBase](https://hbase.apache.org/), even moderately big installations store 
billions of rows.
-It goes without saying that __full bulk loads are simply infeasible__ and more 
efficient approaches are needed if ingestion is to keep up with the typically 
high update volumes.
+对于NoSQL数据存储,如[Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
+毫无疑问, __全量加载不可行__ 如果摄取需要跟上较高的更新量,那么则需要更有效的方法。
-Even for immutable data sources like [Kafka](kafka.apache.org) , Hudi helps 
__enforces a minimum file size on HDFS__, which improves NameNode health by 
solving one of the [age old problems in Hadoop 
land](https://blog.cloudera.com/blog/2009/02/the-small-files-problem/) in a 
holistic way. This is all the more important for event streams, since typically 
its higher volume (eg: click streams) and if not managed well, can cause 
serious damage to your Hadoop cluster.
+即使对于像[Kafka](kafka.apache.org)这样的不可变数据源,Hudi也可以 __强制在HDFS上使用最小文件大小__, 
-Across all sources, Hudi adds the much needed ability to atomically publish 
new data to consumers via notion of commits, shielding them from partial 
ingestion failures
+## 近实时分析
-## Near Real-time Analytics
 Review comment:
   “[相比于这样安装Hadoop]” should be after “完美的”?
   “这需要” => “这种情况需要”

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to