This is an automated email from the ASF dual-hosted git repository. lamberken pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 183aac0 [MINOR] fix typo in comparison document (#1588) 183aac0 is described below commit 183aac0cbc186b14b557ebe3b678c320bd6fef91 Author: wanglisheng81 <37138788+wanglishen...@users.noreply.github.com> AuthorDate: Wed May 6 16:08:03 2020 +0800 [MINOR] fix typo in comparison document (#1588) --- docs/_docs/1_5_comparison.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_docs/1_5_comparison.md b/docs/_docs/1_5_comparison.md index 78f2be2..32b73c6 100644 --- a/docs/_docs/1_5_comparison.md +++ b/docs/_docs/1_5_comparison.md @@ -18,7 +18,7 @@ Consequently, Kudu does not support incremental pulling (as of early 2017), some Kudu diverges from a distributed file system abstraction and HDFS altogether, with its own set of storage servers talking to each other via RAFT. Hudi, on the other hand, is designed to work with an underlying Hadoop compatible filesystem (HDFS,S3 or Ceph) and does not have its own fleet of storage servers, -instead relying on Apache Spark to do the heavy-lifting. Thu, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware +instead relying on Apache Spark to do the heavy-lifting. Thus, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware & operational support, typical to datastores like HBase or Vertica. We have not at this point, done any head to head benchmarks against Kudu (given RTTable is WIP). But, if we were to go with results shared by [CERN](https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines) , we expect Hudi to positioned at something that ingests parquet with superior performance.