yihua commented on a change in pull request #925: [HUDI-256] Translate Comparison page URL: https://github.com/apache/incubator-hudi/pull/925#discussion_r328342066
########## File path: docs/comparison.cn.md ########## @@ -6,53 +6,45 @@ permalink: comparison.html toc: false --- -Apache Hudi fills a big void for processing data on top of DFS, and thus mostly co-exists nicely with these technologies. However, -it would be useful to understand how Hudi fits into the current big data ecosystem, contrasting it with a few related systems -and bring out the different tradeoffs these systems have accepted in their design. +Apache Hudi填补了在DFS上处理数据的巨大空白,并可以和这些技术很好地共存。然而, +了解Hudi如何适应当前的大数据生态系统,并将其与一些相关系统进行对比,了解这些系统在设计中做的不同权衡将非常有用。 ## Kudu -[Apache Kudu](https://kudu.apache.org) is a storage system that has similar goals as Hudi, which is to bring real-time analytics on petabytes of data via first -class support for `upserts`. A key differentiator is that Kudu also attempts to serve as a datastore for OLTP workloads, something that Hudi does not aspire to be. -Consequently, Kudu does not support incremental pulling (as of early 2017), something Hudi does to enable incremental processing use cases. +[Apache Kudu](https://kudu.apache.org)是一个与Hudi具有相似目标的存储系统,该系统通过对`upserts`支持来对PB级数据进行实时分析。 +一个关键的区别是Kudu还试图充当OLTP工作负载的数据存储,而Hudi并不希望这样做。 +因此,Kudu不支持增量拉取(截至2017年初),而Hudi支持以便进行增量处理。 +Kudu与分布式文件系统抽象和HDFS完全不同,它自己的一组存储服务器通过RAFT相互通信。 +另一方面,Hudi旨在与底层Hadoop兼容文件系统(HDFS,S3或Ceph)一起使用,并且没有自己的存储服务器群,而是依靠Apache Spark来完成繁重的工作。 +因此,Hudi可以像其他Spark作业一样轻松扩展,而Kudu则需要硬件和运营支持,特别是HBase或Vertica等数据存储系统。 +到目前为止,我们还没有针对Kudu做任何正面的基准测试(鉴于RTTable正在进行中)。 +但是,如果我们要使用[CERN](https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and -存储引擎), +我们希望Hudi定位于能吸纳parquet的卓越性能。 Review comment: “我们希望Hudi定位于能吸纳parquet的卓越性能” => “我们预期Hudi在摄取parquet上有更卓越的性能” ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
