[hudi] branch asf-site updated: Travis CI build asf-site

vinoth Tue, 08 Dec 2020 04:55:46 -0800

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new faaaa93  Travis CI build asf-site
faaaa93 is described below

commit faaaa9352eebd849cfeec9920220e31cb5b53c82
Author: CI <[email protected]>
AuthorDate: Tue Dec 8 12:55:24 2020 +0000

    Travis CI build asf-site
---
 content/activity.html                              |  24 ++
 .../blog/2020-12-01-t3go-architecture-alluxio.png  | Bin 0 -> 123624 bytes
 .../images/blog/2020-12-01-t3go-architecture.png   | Bin 0 -> 72891 bytes
 .../images/blog/2020-12-01-t3go-microbenchmark.png | Bin 0 -> 56321 bytes
 content/assets/js/lunr/lunr-store.js               |   5 +
 content/blog.html                                  |  24 ++
 .../index.html                                     | 346 +++++++++++++++++++++
 content/cn/activity.html                           |  24 ++
 content/docs/powered_by.html                       |  16 +-
 content/sitemap.xml                                |   4 +
 10 files changed, 442 insertions(+), 1 deletion(-)

diff --git a/content/activity.html b/content/activity.html
index 7d354cd..f2783f2 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -191,6 +191,30 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
+        <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/" 
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and 
Alluxio at T3Go
+</a>
+      
+    </h2>
+    <!-- Look the author details up from the site config. -->
+    
+    <!-- Output author details if some exist. -->
+    <div class="archive__item-meta"><a href="https://www.t3go.cn/";>Trevor 
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1, 
2020</time></div>
+ 
+    <p class="archive__item-excerpt" itemprop="description">How T3Go’s 
high-performance data lake using Apache Hudi and Alluxio shortened the time for 
data ingestion into the lake by up to a factor of 2. Data analysts using 
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries 
speed up by 10 times faster.
+</p>
+  </article>
+</div>
+
+        
+        
+
+
+
+<div class="list__item">
+  <article class="archive__item" itemscope 
itemtype="https://schema.org/CreativeWork";>
+    
+    <h2 class="archive__item-title" itemprop="headline">
+      
         <a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply 
record level changes from relational databases to Amazon S3 data lake using 
Apache Hudi on Amazon EMR and AWS Database Migration Service
 </a>
       
diff --git 
a/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png 
b/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png
new file mode 100644
index 0000000..b3a393b
Binary files /dev/null and 
b/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png differ
diff --git a/content/assets/images/blog/2020-12-01-t3go-architecture.png 
b/content/assets/images/blog/2020-12-01-t3go-architecture.png
new file mode 100644
index 0000000..53dd660
Binary files /dev/null and 
b/content/assets/images/blog/2020-12-01-t3go-architecture.png differ
diff --git a/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png 
b/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png
new file mode 100644
index 0000000..dd77ed6
Binary files /dev/null and 
b/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png differ
diff --git a/content/assets/js/lunr/lunr-store.js 
b/content/assets/js/lunr/lunr-store.js
index f2c4121..b1db17b 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1203,4 +1203,9 @@ var store = [{
         "excerpt":"This blog published by AWS shows how to build a CDC 
pipeline that captures data from an Amazon Relational Database Service (Amazon 
RDS) for MySQL database using AWS Database Migration Service (AWS DMS) and 
applies those changes to a dataset in Amazon S3 using Apache Hudi on Amazon 
EMR.  ","categories": ["blog"],
         "tags": [],
         "url": "https://hudi.apache.org/blog/hudi-meets-aws-emr-and-aws-dms/";,
+        "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
+        "title": "Building High-Performance Data Lake Using Apache Hudi and 
Alluxio at T3Go",
+        "excerpt":"Building High-Performance Data Lake Using Apache Hudi and 
Alluxio at T3Go T3Go is China’s first platform for smart travel based on the 
Internet of Vehicles. In this article, Trevor Zhang and Vino Yang from T3Go 
describe the evolution of their data lake architecture, built on cloud-native 
or open-source technologies including...","categories": ["blog"],
+        "tags": [],
+        "url": 
"https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/";,
         "teaser":"https://hudi.apache.org/assets/images/500x300.png"},]
diff --git a/content/blog.html b/content/blog.html
index 3a4e4b9..3253983 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -189,6 +189,30 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
+        <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/" 
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and 
Alluxio at T3Go
+</a>
+      
+    </h2>
+    <!-- Look the author details up from the site config. -->
+    
+    <!-- Output author details if some exist. -->
+    <div class="archive__item-meta"><a href="https://www.t3go.cn/";>Trevor 
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1, 
2020</time></div>
+ 
+    <p class="archive__item-excerpt" itemprop="description">How T3Go’s 
high-performance data lake using Apache Hudi and Alluxio shortened the time for 
data ingestion into the lake by up to a factor of 2. Data analysts using 
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries 
speed up by 10 times faster.
+</p>
+  </article>
+</div>
+
+        
+        
+
+
+
+<div class="list__item">
+  <article class="archive__item" itemscope 
itemtype="https://schema.org/CreativeWork";>
+    
+    <h2 class="archive__item-title" itemprop="headline">
+      
         <a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply 
record level changes from relational databases to Amazon S3 data lake using 
Apache Hudi on Amazon EMR and AWS Database Migration Service
 </a>
       
diff --git 
a/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html 
b/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html
new file mode 100644
index 0000000..b15cba9
--- /dev/null
+++ b/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html
@@ -0,0 +1,346 @@
+<!doctype html>
+<html lang="en" class="no-js">
+  <head>
+    <meta charset="utf-8">
+
+<!-- begin _includes/seo.html --><title>Building High-Performance Data Lake 
Using Apache Hudi and Alluxio at T3Go - Apache Hudi</title>
+<meta name="description" content="How T3Go’s high-performance data lake using 
Apache Hudi and Alluxio shortened the time for data ingestion into the lake by 
up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in 
conjunction to query data on the lake saw queries speed up by 10 times faster.">
+
+<meta property="og:type" content="article">
+<meta property="og:locale" content="en_US">
+<meta property="og:site_name" content="">
+<meta property="og:title" content="Building High-Performance Data Lake Using 
Apache Hudi and Alluxio at T3Go">
+<meta property="og:url" 
content="https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/";>
+
+
+  <meta property="og:description" content="How T3Go’s high-performance data 
lake using Apache Hudi and Alluxio shortened the time for data ingestion into 
the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio 
in conjunction to query data on the lake saw queries speed up by 10 times 
faster.">
+
+
+
+
+
+
+
+
+
+
+
+<!-- end _includes/seo.html -->
+
+
+<!--<link href="/feed.xml" type="application/atom+xml" rel="alternate" title=" 
Feed">-->
+
+<!-- https://t.co/dKP3o1e -->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+<script>
+  document.documentElement.className = 
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+</script>
+
+<!-- For all browsers -->
+<link rel="stylesheet" href="/assets/css/main.css">
+
+<!--[if IE]>
+  <style>
+    /* old IE unsupported flexbox fixes */
+    .greedy-nav .site-title {
+      padding-right: 3em;
+    }
+    .greedy-nav button {
+      position: absolute;
+      top: 0;
+      right: 0;
+      height: 100%;
+    }
+  </style>
+<![endif]-->
+
+
+
+<link rel="icon" type="image/x-icon" href="/assets/images/favicon.ico">
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+<script src="/assets/js/jquery.min.js"></script>
+
+    
+<script src="/assets/js/main.min.js"></script>
+
+  </head>
+
+  <body class="layout--single">
+    <!--[if lt IE 9]>
+<div class="notice--danger align-center" style="margin: 0;">You are using an 
<strong>outdated</strong> browser. Please <a 
href="https://browsehappy.com/";>upgrade your browser</a> to improve your 
experience.</div>
+<![endif]-->
+
+    <div class="masthead">
+  <div class="masthead__inner-wrap" id="masthead__inner-wrap">
+    <div class="masthead__menu">
+      <nav id="site-nav" class="greedy-nav">
+        
+          <a class="site-logo" href="/">
+              <div style="width: 150px; height: 40px">
+              </div>
+          </a>
+        
+        <a class="site-title" href="/">
+          
+        </a>
+        <ul class="visible-links"><li class="masthead__menu-item">
+              <a href="/docs/quick-start-guide.html" target="_self" 
>Documentation</a>
+            </li><li class="masthead__menu-item">
+              <a href="/community.html" target="_self" >Community</a>
+            </li><li class="masthead__menu-item">
+              <a href="/blog.html" target="_self" >Blog</a>
+            </li><li class="masthead__menu-item">
+              <a href="https://cwiki.apache.org/confluence/display/HUDI/FAQ"; 
target="_blank" >FAQ</a>
+            </li><li class="masthead__menu-item">
+              <a href="/releases.html" target="_self" >Releases</a>
+            </li></ul>
+        <button class="greedy-nav__toggle hidden" type="button">
+          <span class="visually-hidden">Toggle menu</span>
+          <div class="navicon"></div>
+        </button>
+        <ul class="hidden-links hidden"></ul>
+      </nav>
+    </div>
+  </div>
+</div>
+<!--
+<p class="notice--warning" style="margin: 0 !important; text-align: center 
!important;"><strong>Note:</strong> This site is work in progress, if you 
notice any issues, please <a target="_blank" 
href="https://github.com/apache/hudi/issues";>Report on Issue</a>.
+  Click <a href="/"> here</a> back to old site.</p>
+-->
+
+    <div class="initial-content">
+      <div id="main" role="main">
+  
+
+  <div class="sidebar sticky">
+
+  
+    <div itemscope itemtype="https://schema.org/Person";>
+
+  <div class="author__content">
+    
+      <h3 class="author__name" itemprop="name">Quick Links</h3>
+    
+    
+      <div class="author__bio" itemprop="description">
+        <p>Hudi <em>ingests</em> &amp; <em>manages</em> storage of large 
analytical datasets over DFS.</p>
+
+      </div>
+    
+  </div>
+
+  <div class="author__urls-wrapper">
+    <ul class="author__urls social-icons">
+      
+        
+          <li><a href="/docs/quick-start-guide" target="_self" rel="nofollow 
noopener noreferrer"><i class="fa fa-book" aria-hidden="true"></i> 
Documentation</a></li>
+
+          
+        
+          <li><a href="https://cwiki.apache.org/confluence/display/HUDI"; 
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-wikipedia-w" 
aria-hidden="true"></i> Technical Wiki</a></li>
+
+          
+        
+          <li><a href="/contributing" target="_self" rel="nofollow noopener 
noreferrer"><i class="fa fa-thumbs-o-up" aria-hidden="true"></i> Contribution 
Guide</a></li>
+
+          
+        
+          <li><a 
href="https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE";
 target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-slack" 
aria-hidden="true"></i> Join on Slack</a></li>
+
+          
+        
+          <li><a href="https://github.com/apache/hudi"; target="_blank" 
rel="nofollow noopener noreferrer"><i class="fa fa-github" 
aria-hidden="true"></i> Fork on GitHub</a></li>
+
+          
+        
+          <li><a href="https://issues.apache.org/jira/projects/HUDI/summary"; 
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-navicon" 
aria-hidden="true"></i> Report Issues</a></li>
+
+          
+        
+          <li><a href="/security" target="_self" rel="nofollow noopener 
noreferrer"><i class="fa fa-navicon" aria-hidden="true"></i> Report Security 
Issues</a></li>
+
+          
+        
+      
+    </ul>
+  </div>
+</div>
+
+  
+
+  
+  </div>
+
+
+  <article class="page" itemscope itemtype="https://schema.org/CreativeWork";>
+    <!-- Look the author details up from the site config. -->
+    
+
+    <div class="page__inner-wrap">
+      
+        <header>
+          <h1 id="page-title" class="page__title" itemprop="headline">Building 
High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go
+</h1>
+          <!-- Output author details if some exist. -->
+          <div class="page__author"><a href="https://www.t3go.cn/";>Trevor 
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1, 
2020</time></span>
+        </header>
+      
+
+      <section class="page__content" itemprop="text">
+        
+          <style>
+            .page {
+              padding-right: 0 !important;
+            }
+          </style>
+        
+        <h1 
id="building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go">Building
 High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go</h1>
+<p><a href="https://www.t3go.cn/";>T3Go</a>  is China’s first platform for 
smart travel based on the Internet of Vehicles. In this article, Trevor Zhang 
and Vino Yang from T3Go describe the evolution of their data lake architecture, 
built on cloud-native or open-source technologies including Alibaba OSS, Apache 
Hudi, and Alluxio. Today, their data lake stores petabytes of data, supporting 
hundreds of pipelines and tens of thousands of tasks daily. It is essential for 
business units at T3G [...]
+
+<p>In this blog, you will see how we slashed data ingestion time by half using 
Hudi and Alluxio. Furthermore, data analysts using Presto, Hudi, and Alluxio 
saw the queries speed up by 10 times. We built our data lake based on data 
orchestration for multiple stages of our data pipeline, including ingestion and 
analytics.</p>
+
+<h1 id="i-t3go-data-lake-overview">I. T3Go data lake Overview</h1>
+
+<p>Prior to the data lake, different business units within T3Go managed their 
own data processing solutions, utilizing different storage systems, ETL tools, 
and data processing frameworks. Data for each became siloed from every other 
unit, significantly increasing cost and complexity. Due to the rapid business 
expansion of T3Go, this inefficiency became our engineering bottleneck.</p>
+
+<p>We moved to a unified data lake solution based on Alibaba OSS, an object 
store similar to AWS S3, to provide a centralized location to store structured 
and unstructured data, following the design principles of  <em>Multi-cluster 
Shared-data Architecture</em>; all the applications access OSS storage as the 
source of truth, as opposed to different data silos. This architecture allows 
us to store the data as-is, without having to first structure the data, and run 
different types of analy [...]
+
+<h1 id="ii-efficient-near-real-time-analytics-using-hudi">II. Efficient Near 
Real-time Analytics Using Hudi</h1>
+
+<p>Our business in smart travel drives the need to process and analyze data in 
a near real-time manner. With a traditional data warehouse, we faced the 
following challenges:</p>
+
+<ol>
+  <li>High overhead when updating due to long-tail latency</li>
+  <li>High cost of order analysis due to the long window of a business 
session</li>
+  <li>Reduced query accuracy due to late or ad-hoc updates</li>
+  <li>Unreliability in data ingestion pipeline</li>
+  <li>Data lost in the distributed data pipeline that cannot be reconciled</li>
+  <li>High latency to access data storage</li>
+</ol>
+
+<p>As a result, we adopted Apache Hudi on top of OSS to address these issues. 
The following diagram outlines the architecture:</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-architecture.png" 
alt="architecture" /></p>
+
+<h2 id="enable-near-real-time-data-ingestion-and-analysis">Enable Near real 
time data ingestion and analysis</h2>
+
+<p>With Hudi, our data lake supports multiple data sources including Kafka, 
MySQL binlog, GIS, and other business logs in near real time. As a result, more 
than 60% of the company’s data is stored in the data lake and this proportion 
continues to increase.</p>
+
+<p>We are also able to speed up the data ingestion time down to a few minutes 
by introducing Apache Hudi into the data pipeline. Combined with big data 
interactive query and analysis framework such as Presto and SparkSQL, real-time 
data analysis and insights are achieved.</p>
+
+<h2 id="enable-incremental-processing-pipeline">Enable Incremental processing 
pipeline</h2>
+
+<p>With the help of Hudi, it is possible to provide incremental changes to the 
downstream derived table when the upstream table updates frequently. Even with 
a large number of interdependent tables, we can quickly run partial data 
updates. This also effectively avoids updating the full partitions of cold 
tables in the traditional Hive data warehouse.</p>
+
+<h2 id="accessing-data-using-hudi-as-a-unified-format">Accessing Data using 
Hudi as a unified format</h2>
+
+<p>Traditional data warehouses often deploy Hadoop to store data and provide 
batch analysis. Kafka is used separately to distribute Hadoop data to other 
data processing frameworks, resulting in duplicated data. Hudi helps 
effectively solve this problem; we always use Spark pipelines to insert new 
updates into the Hudi tables, then incrementally read the update of Hudi 
tables. In other words, Hudi tables are used as the unified storage format to 
access data.</p>
+
+<h1 id="iii-efficient-data-caching-using-alluxio">III. Efficient Data Caching 
Using Alluxio</h1>
+
+<p>In the early version of our data lake without Alluxio, data received from 
Kafka in real time is processed by Spark and then written to OSS data lake 
using Hudi DeltaStreamer tasks. With this architecture, Spark often suffered 
high network latency when writing to OSS directly. Since all data is in OSS 
storage, OLAP queries on Hudi data may also be slow due to lack of data 
locality.</p>
+
+<p>To address the latency issue, we deployed Alluxio as a data orchestration 
layer, co-located with computing engines such as Spark and Presto, and used 
Alluxio to accelerate read and write on the data lake as shown in the following 
diagram:</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png" 
alt="architecture-alluxio" /></p>
+
+<p>Data in formats such as Hudi, Parquet, ORC, and JSON are stored mostly on 
OSS, consisting of 95% of the data. Computing engines such as Flink, Spark, 
Kylin, and Presto are deployed in isolated clusters respectively. When each 
engine accesses OSS, Alluxio acts as a virtual distributed storage system to 
accelerate data, being co-located with each of the computing clusters.</p>
+
+<p>Specifically, here are a few applications leveraging Alluxio in the T3Go 
data lake.</p>
+
+<h2 id="data-lake-ingestion">Data lake ingestion</h2>
+
+<p>We mount the corresponding OSS path to the Alluxio file system and set 
Hudi’s  <em>“<strong>target-base-path</strong>”</em>  parameter value to use 
the alluxio:// scheme in place of oss:// scheme. Spark pipelines with Hudi 
continuously ingest data to Alluxio. After data is written to Alluxio, it is 
asynchronously persisted from the Alluxio cache to the remote OSS every minute. 
These modifications allow Spark to write to a local Alluxio node instead of 
writing to remote OSS, significan [...]
+
+<h2 id="data-analysis-on-the-lake">Data analysis on the lake</h2>
+
+<p>We use Presto as an ad-hoc query engine to analyze the Hudi tables in the 
lake, co-locating Alluxio workers on each Presto worker node. When Presto and 
Alluxio services are co-located and running, Alluxio caches the input data 
locally in the Presto worker which greatly benefits Presto for subsequent 
retrievals. On a cache hit, Presto can read from the local Alluxio worker 
storage at memory speed without any additional data transfer over the 
network.</p>
+
+<h2 id="concurrent-accesses-across-multiple-storage-systems">Concurrent 
accesses across multiple storage systems</h2>
+
+<p>In order to ensure the accuracy of training samples, our machine learning 
team often synchronizes desensitized data in production to an offline machine 
learning environment. During synchronization, the data flows across multiple 
file systems, from production OSS to an offline HDFS followed by another 
offline Machine Learning HDFS.</p>
+
+<p>This data migration process is not only inefficient but also error-prune 
for modelers because multiple different storages with varying configurations 
are involved. Alluxio helps in this specific scenario by mounting the 
destination storage systems under the same filesystem to be accessed by their 
corresponding logical paths in Alluxio namespace. By decoupling the physical 
storage, this allows applications with different APIs to access and transfer 
data seamlessly. This data access lay [...]
+
+<h2 id="microbenchmark">Microbenchmark</h2>
+
+<p>Overall, we observed the following improvements with Alluxio:</p>
+
+<ol>
+  <li>It supports a hierarchical and transparent caching mechanism</li>
+  <li>It supports cache promote omode mode when reading</li>
+  <li>It supports asynchronous writing mode</li>
+  <li>It supports LRU recycling strategy</li>
+  <li>It has pin and TTL features</li>
+</ol>
+
+<p>After comparison and verification, we choose to use Spark SQL as the query 
engine. Our performance testing queries the Hudi table, comparing Alluxio + OSS 
together against OSS directly as well as HDFS.</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-microbenchmark.png" 
alt="microbench" /></p>
+
+<p>In the stress test shown above, after the data volume is greater than a 
certain magnitude (2400W), the query speed using Alluxio+OSS surpasses the HDFS 
query speed of the hybrid deployment. After the data volume is greater than 1E, 
the query speed starts to double. After reaching 6E data, it is up to 12 times 
higher than querying native OSS and 8 times higher than querying native HDFS. 
The improvement depends on the machine configuration.</p>
+
+<p>Based on our performance benchmarking, we found that the performance can be 
improved by over 10 times with the help of Alluxio. Furthermore, the larger the 
data scale, the more prominent the performance improvement.</p>
+
+<h1 id="iv-next-step">IV. Next Step</h1>
+
+<p>As T3Go’s data lake ecosystem expands, we will continue facing the critical 
scenario of compute and storage segregation. With T3Go’s growing data 
processing needs, our team plans to deploy Alluxio on a larger scale to 
accelerate our data lake storage.</p>
+
+<p>In addition to the deployment of Alluxio on the data lake computing engine, 
which currently is mainly SparkSQL, we plan to add a layer of Alluxio to the 
OLAP cluster using Apache Kylin and an ad_hoc cluster using Presto. The goal is 
to have Alluxio cover all computing scenarios, with Alluxio interconnected 
between each scene to improve the read and write efficiency of the data lake 
and the surrounding lake ecology.</p>
+
+<h1 id="v-conclusion">V. Conclusion</h1>
+
+<p>As mentioned earlier, Hudi and Alluxio covers all scenarios of Hudi’s near 
real-time ingestion, near real-time analysis, incremental processing, and data 
distribution on DFS, among many others, and plays the role of a powerful 
accelerator on data ingestion and data analysis on the lake. With Hudi and 
Alluxio together,  <strong>our R&amp;D engineers shortened the time for data 
ingestion into the lake by up to a factor of 2. Data analysts using Presto, 
Hudi, and Alluxio in conjunction t [...]
+
+      </section>
+
+      <a href="#masthead__inner-wrap" class="back-to-top">Back to top 
&uarr;</a>
+
+
+      
+
+    </div>
+
+  </article>
+
+</div>
+
+    </div>
+
+    <div class="page__footer">
+      <footer>
+        
+<div class="row">
+  <div class="col-lg-12 footer">
+    <p>
+      <table class="table-apache-info">
+        <tr>
+          <td>
+            <a class="footer-link-img" href="https://apache.org";>
+              <img width="250px" src="/assets/images/asf_logo.svg" alt="The 
Apache Software Foundation">
+            </a>
+          </td>
+          <td>
+            <a style="float: right" 
href="https://www.apache.org/events/current-event.html";>
+              <img 
src="https://www.apache.org/events/current-event-234x60.png"; />
+            </a>
+          </td>
+        </tr>
+      </table>
+    </p>
+    <p>
+      <a href="https://www.apache.org/licenses/";>License</a> | <a 
href="https://www.apache.org/security/";>Security</a> | <a 
href="https://www.apache.org/foundation/thanks.html";>Thanks</a> | <a 
href="https://www.apache.org/foundation/sponsorship.html";>Sponsorship</a>
+    </p>
+    <p>
+      Copyright &copy; <span id="copyright-year">2019</span> <a 
href="https://apache.org";>The Apache Software Foundation</a>, Licensed under 
the <a href="https://www.apache.org/licenses/LICENSE-2.0";> Apache License, 
Version 2.0</a>.
+      Hudi, Apache and the Apache feather logo are trademarks of The Apache 
Software Foundation. <a href="/docs/privacy">Privacy Policy</a>
+    </p>
+  </div>
+</div>
+      </footer>
+    </div>
+
+
+  </body>
+</html>
\ No newline at end of file
diff --git a/content/cn/activity.html b/content/cn/activity.html
index 222d170..c87c2fe 100644
--- a/content/cn/activity.html
+++ b/content/cn/activity.html
@@ -191,6 +191,30 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
+        <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/" 
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and 
Alluxio at T3Go
+</a>
+      
+    </h2>
+    <!-- Look the author details up from the site config. -->
+    
+    <!-- Output author details if some exist. -->
+    <div class="archive__item-meta"><a href="https://www.t3go.cn/";>Trevor 
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1, 
2020</time></div>
+ 
+    <p class="archive__item-excerpt" itemprop="description">How T3Go’s 
high-performance data lake using Apache Hudi and Alluxio shortened the time for 
data ingestion into the lake by up to a factor of 2. Data analysts using 
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries 
speed up by 10 times faster.
+</p>
+  </article>
+</div>
+
+        
+        
+
+
+
+<div class="list__item">
+  <article class="archive__item" itemscope 
itemtype="https://schema.org/CreativeWork";>
+    
+    <h2 class="archive__item-title" itemprop="headline">
+      
         <a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply 
record level changes from relational databases to Amazon S3 data lake using 
Apache Hudi on Amazon EMR and AWS Database Migration Service
 </a>
       
diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html
index 04e64dd..df48680 100644
--- a/content/docs/powered_by.html
+++ b/content/docs/powered_by.html
@@ -486,17 +486,27 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA</p>
     <p><a href="https://youtu.be/nA3rwOdmm3A";>“PrestoDB and Apache Hudi”</a> - 
By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB 
Community Meetup.</p>
   </li>
   <li>
+    <p><a href="https://www.youtube.com/watch?v=hNxrsjhI-9w";>“DC_THURS : 
Apache Hudi w/ Nishith Agarwal &amp; Vinoth Chandar”</a>, Aug 2020, Online 
discussion/Q&amp;A with DataCouncil Founder</p>
+  </li>
+  <li>
     <p><a href="https://www.youtube.com/watch?v=lsFSM2Z4kPs";>“Panel Discussion 
on Presto Ecosystem”</a> - By Vinoth Chandar, Sep 2020, PrestoCon <a 
href="https://prestocon2020.sched.com/event/dgyw";>“panel”</a>.</p>
   </li>
   <li>
     <p><a 
href="https://docs.google.com/presentation/d/1y-ryRwCdTbqQHGr_bn3lxM_B8L1L5nsZOIXlJsDl_wU/edit?usp=sharing";>“Next
 Generation Data lakes using Apache Hudi”</a> - By Balaji Varadarajan and 
Sivabalan Narayanan, Sep 2020, <a 
href="https://www.apachecon.com/";>“ApacheCon”</a></p>
   </li>
   <li>
+    <p><a 
href="https://www.dbta.com/DataSummit/Fall2020/Agenda.aspx";>“Building 
Large-Scale, Transactional Data Lakes using Apache Hudi”</a> - By Nishith 
Agarwal, Data Summit 2020</p>
+  </li>
+  <li>
     <p><a 
href="https://drive.google.com/file/d/1ULVPkjynaw-07wsutLcZm-4rVXf8E8N8/view?usp=sharing";>“Landing
 practice of Apache Hudi in T3go”</a> - By VinoYang and XianghuWang, November 
2020, Qcon.</p>
-    <h2 id="articles">Articles</h2>
+  </li>
+  <li>
+    <p><a href="https://www.meetup.com/UberEvents/events/274924537/";>“Meetup 
talk by Nishith Agarwal”</a> - Uber Data Platforms Meetup, Dec 2020</p>
   </li>
 </ol>
 
+<h2 id="articles">Articles</h2>
+
 <p>You can check out <a href="https://hudi.apache.org/blog.html";>our blog 
pages</a> for content written by our committers/contributors.</p>
 
 <ol>
@@ -512,6 +522,10 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA</p>
   <li><a 
href="https://towardsdatascience.com/data-lake-change-data-capture-cdc-using-apache-hudi-on-amazon-emr-part-2-process-65e4662d7b4b";>“Data
 Lake Change Capture using Apache Hudi &amp; Amazon AMS/EMR”</a> - Towards 
DataScience article, Oct 20</li>
   <li><a 
href="https://aws.amazon.com/blogs/apn/how-nclouds-helps-accelerate-data-delivery-with-apache-hudi-on-amazon-emr/";>“How
 nClouds Helps Accelerate Data Delivery with Apache Hudi on Amazon EMR”</a> - 
published by nClouds in partnership with AWS</li>
   <li><a 
href="https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/";>“Apply
 record level changes from relational databases to Amazon S3 data lake using 
Apache Hudi on Amazon EMR and AWS Database Migration Service”</a> - AWS 
blog</li>
+  <li><a 
href="https://www.dbta.com/Editorial/News-Flashes/Architecting-Data-Lakes-for-the-Modern-Enterprise-at-Data-Summit-Connect-Fall-2020-143512.aspx";>“Architecting
 Data Lakes for the Modern Enterprise at Data Summit Connect Fall 2020”</a></li>
+  <li><a 
href="https://www.analyticsinsight.net/can-big-data-solutions-be-affordable/";>“Can
 Big Data Solutions Be Affordable?”</a></li>
+  <li><a 
href="https://www.alluxio.io/blog/building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go/";>“Building
 High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go”</a></li>
+  <li><a 
href="https://towardsdatascience.com/data-lake-change-data-capture-cdc-using-apache-hudi-on-amazon-emr-part-2-process-65e4662d7b4b";>“Data
 Lake Change Capture using Apache Hudi &amp; Amazon AMS/EMR Part 2”</a></li>
 </ol>
 
 <h2 id="powered-by">Powered by</h2>
diff --git a/content/sitemap.xml b/content/sitemap.xml
index db44ee0..432cf35 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -965,6 +965,10 @@
 <lastmod>2020-10-19T00:00:00-04:00</lastmod>
 </url>
 <url>
+<loc>https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/</loc>
+<lastmod>2020-12-01T00:00:00-05:00</lastmod>
+</url>
+<url>
 <loc>https://hudi.apache.org/cn/activity</loc>
 <lastmod>2019-12-30T14:59:57-05:00</lastmod>
 </url>

[hudi] branch asf-site updated: Travis CI build asf-site

Reply via email to