[GitHub] [incubator-hudi] SuXingLee commented on a change in pull request #1069: [doc][chinese] Update and fix errors in chinese documentation

GitBox Mon, 02 Dec 2019 19:06:29 -0800

SuXingLee commented on a change in pull request #1069: [doc][chinese] Update 
and fix errors in chinese documentation
URL: https://github.com/apache/incubator-hudi/pull/1069#discussion_r352965889


 ##########
 File path: content/cn/use_cases.html
 ##########
 @@ -341,24 +341,24 @@ <h2 id="近实时摄取">近实时摄取</h2>
 <p>将外部源(如事件日志、数据库、外部源)的数据摄取到<a 
href="http://martinfowler.com/bliki/DataLake.html";>Hadoop数据湖</a>是一个众所周知的问题。
 尽管这些数据对整个组织来说是最有价值的，但不幸的是，在大多数(如果不是全部)Hadoop部署中都使用零散的方式解决，即使用多个不同的摄取工具。</p>
 
-<p>对于RDBMS摄取，Hudi提供__通过更新插入达到更快加载__，而不是昂贵且低效的批量加载。例如，您可以读取MySQL BIN日志或<a 
href="https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports";>Sqoop增量导入</a>并将其应用于
+<p>对于RDBMS摄取，Hudi提供<strong>通过Upserts提供了更快加载</strong>，而不是昂贵且低效的批量加载。例如，您可以读取MySQL
 Binlog或<a 
href="https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports";>Sqoop增量导入</a>并将其应用于
 DFS上的等效Hudi表。这比<a 
href="https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1770457";>批量合并任务</a>及<a
 
href="http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/";>复杂的手工合并工作流</a>更快/更有效率。</p>
 
 <p>对于NoSQL数据存储，如<a href="http://cassandra.apache.org/";>Cassandra</a> / <a 
href="http://www.project-voldemort.com/voldemort/";>Voldemort</a> / <a 
href="https://hbase.apache.org/";>HBase</a>，即使是中等规模大小也会存储数十亿行。
 毫无疑问， <strong>全量加载不可行</strong>，如果摄取需要跟上较高的更新量，那么则需要更有效的方法。</p>
 
-<p>即使对于像<a href="kafka.apache.org">Kafka</a>这样的不可变数据源，Hudi也可以 
<strong>强制在HDFS上使用最小文件大小</strong>, 这采取了综合方式解决<a 
href="https://blog.cloudera.com/blog/2009/02/the-small-files-problem/";>Hadoop中的一个老问题</a>来改善NameNode的健康状况。这对事件流来说更为重要，因为它通常具有较高容量(例如：点击流)，如果管理不当，可能会对Hadoop群集造成严重损害。</p>
+<p>即使对于像<a href="kafka.apache.org">Kafka</a>这样的不可变数据源，Hudi也可以 
<strong>强制在HDFS上使用最小文件大小</strong>, 这采取了综合方式解决<a 
href="https://blog.cloudera.com/blog/2009/02/the-small-files-problem/";>HDFS小文件问题</a>来改善NameNode的健康状况。这对事件流来说更为重要，因为它通常具有较高容量(例如：点击流)，如果管理不当，可能会对Hadoop集群造成严重损害。</p>
 
-<p>在所有源中，通过<code 
class="highlighter-rouge">提交</code>这一概念，Hudi增加了以原子方式向消费者发布新数据的功能，这种功能十分必要。</p>
+<p>在所有源中，通过<code 
class="highlighter-rouge">commits</code>这一概念，Hudi增加了以原子方式向消费者发布新数据的功能，这种功能十分必要。</p>
 
 <h2 id="近实时分析">近实时分析</h2>
 
 <p>通常，实时<a 
href="https://en.wikipedia.org/wiki/Data_mart";>数据集市</a>由专业(实时)数据分析存储提供支持，例如<a 
href="http://druid.io/";>Druid</a>或<a 
href="http://www.memsql.com/";>Memsql</a>或<a 
href="http://opentsdb.net/";>OpenTSDB</a>。
 这对于较小规模的数据量来说绝对是完美的(<a 
href="https://blog.twitter.com/2015/hadoop-filesystem-at-twitter";>相比于这样安装Hadoop</a>)，这种情况需要在亚秒级响应查询，例如系统监控或交互式实时分析。
 但是，由于Hadoop上的数据太陈旧了，通常这些系统会被滥用于非交互式查询，这导致利用率不足和硬件/许可证成本的浪费。</p>
 
-<p>另一方面，Hadoop上的交互式SQL解决方案(如Presto和SparkSQL)表现出色，在__几秒钟内完成查询__。
-通过将__数据新鲜度提高到几分钟__，Hudi可以提供一个更有效的替代方案，并支持存储在DFS中的__数量级更大的数据集__的实时分析。
+<p>另一方面，Hadoop上的交互式SQL解决方案(如Presto和SparkSQL)表现出色，在<strong>几秒钟内完成查询</strong>。
+通过将<strong>数据新鲜度提高到几分钟</strong>，Hudi可以提供一个更有效的替代方案，并支持存储在DFS中的<strong>数量级更大的数据集</strong>的实时分析。
 
 Review comment:
   In markdown, we use `__` as bold, but we should use `<strong> </ strong>` 
tags in html.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] SuXingLee commented on a change in pull request #1069: [doc][chinese] Update and fix errors in chinese documentation

Reply via email to