This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new faaaa93 Travis CI build asf-site
faaaa93 is described below
commit faaaa9352eebd849cfeec9920220e31cb5b53c82
Author: CI <[email protected]>
AuthorDate: Tue Dec 8 12:55:24 2020 +0000
Travis CI build asf-site
---
content/activity.html | 24 ++
.../blog/2020-12-01-t3go-architecture-alluxio.png | Bin 0 -> 123624 bytes
.../images/blog/2020-12-01-t3go-architecture.png | Bin 0 -> 72891 bytes
.../images/blog/2020-12-01-t3go-microbenchmark.png | Bin 0 -> 56321 bytes
content/assets/js/lunr/lunr-store.js | 5 +
content/blog.html | 24 ++
.../index.html | 346 +++++++++++++++++++++
content/cn/activity.html | 24 ++
content/docs/powered_by.html | 16 +-
content/sitemap.xml | 4 +
10 files changed, 442 insertions(+), 1 deletion(-)
diff --git a/content/activity.html b/content/activity.html
index 7d354cd..f2783f2 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -191,6 +191,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/"
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and
Alluxio at T3Go
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a href="https://www.t3go.cn/">Trevor
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1,
2020</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">How T3Go’s
high-performance data lake using Apache Hudi and Alluxio shortened the time for
data ingestion into the lake by up to a factor of 2. Data analysts using
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries
speed up by 10 times faster.
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply
record level changes from relational databases to Amazon S3 data lake using
Apache Hudi on Amazon EMR and AWS Database Migration Service
</a>
diff --git
a/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png
b/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png
new file mode 100644
index 0000000..b3a393b
Binary files /dev/null and
b/content/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png differ
diff --git a/content/assets/images/blog/2020-12-01-t3go-architecture.png
b/content/assets/images/blog/2020-12-01-t3go-architecture.png
new file mode 100644
index 0000000..53dd660
Binary files /dev/null and
b/content/assets/images/blog/2020-12-01-t3go-architecture.png differ
diff --git a/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png
b/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png
new file mode 100644
index 0000000..dd77ed6
Binary files /dev/null and
b/content/assets/images/blog/2020-12-01-t3go-microbenchmark.png differ
diff --git a/content/assets/js/lunr/lunr-store.js
b/content/assets/js/lunr/lunr-store.js
index f2c4121..b1db17b 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1203,4 +1203,9 @@ var store = [{
"excerpt":"This blog published by AWS shows how to build a CDC
pipeline that captures data from an Amazon Relational Database Service (Amazon
RDS) for MySQL database using AWS Database Migration Service (AWS DMS) and
applies those changes to a dataset in Amazon S3 using Apache Hudi on Amazon
EMR. ","categories": ["blog"],
"tags": [],
"url": "https://hudi.apache.org/blog/hudi-meets-aws-emr-and-aws-dms/",
+ "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
+ "title": "Building High-Performance Data Lake Using Apache Hudi and
Alluxio at T3Go",
+ "excerpt":"Building High-Performance Data Lake Using Apache Hudi and
Alluxio at T3Go T3Go is China’s first platform for smart travel based on the
Internet of Vehicles. In this article, Trevor Zhang and Vino Yang from T3Go
describe the evolution of their data lake architecture, built on cloud-native
or open-source technologies including...","categories": ["blog"],
+ "tags": [],
+ "url":
"https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/",
"teaser":"https://hudi.apache.org/assets/images/500x300.png"},]
diff --git a/content/blog.html b/content/blog.html
index 3a4e4b9..3253983 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -189,6 +189,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/"
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and
Alluxio at T3Go
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a href="https://www.t3go.cn/">Trevor
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1,
2020</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">How T3Go’s
high-performance data lake using Apache Hudi and Alluxio shortened the time for
data ingestion into the lake by up to a factor of 2. Data analysts using
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries
speed up by 10 times faster.
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply
record level changes from relational databases to Amazon S3 data lake using
Apache Hudi on Amazon EMR and AWS Database Migration Service
</a>
diff --git
a/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html
b/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html
new file mode 100644
index 0000000..b15cba9
--- /dev/null
+++ b/content/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/index.html
@@ -0,0 +1,346 @@
+<!doctype html>
+<html lang="en" class="no-js">
+ <head>
+ <meta charset="utf-8">
+
+<!-- begin _includes/seo.html --><title>Building High-Performance Data Lake
Using Apache Hudi and Alluxio at T3Go - Apache Hudi</title>
+<meta name="description" content="How T3Go’s high-performance data lake using
Apache Hudi and Alluxio shortened the time for data ingestion into the lake by
up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in
conjunction to query data on the lake saw queries speed up by 10 times faster.">
+
+<meta property="og:type" content="article">
+<meta property="og:locale" content="en_US">
+<meta property="og:site_name" content="">
+<meta property="og:title" content="Building High-Performance Data Lake Using
Apache Hudi and Alluxio at T3Go">
+<meta property="og:url"
content="https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/">
+
+
+ <meta property="og:description" content="How T3Go’s high-performance data
lake using Apache Hudi and Alluxio shortened the time for data ingestion into
the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio
in conjunction to query data on the lake saw queries speed up by 10 times
faster.">
+
+
+
+
+
+
+
+
+
+
+
+<!-- end _includes/seo.html -->
+
+
+<!--<link href="/feed.xml" type="application/atom+xml" rel="alternate" title="
Feed">-->
+
+<!-- https://t.co/dKP3o1e -->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+<script>
+ document.documentElement.className =
document.documentElement.className.replace(/\bno-js\b/g, '') + ' js ';
+</script>
+
+<!-- For all browsers -->
+<link rel="stylesheet" href="/assets/css/main.css">
+
+<!--[if IE]>
+ <style>
+ /* old IE unsupported flexbox fixes */
+ .greedy-nav .site-title {
+ padding-right: 3em;
+ }
+ .greedy-nav button {
+ position: absolute;
+ top: 0;
+ right: 0;
+ height: 100%;
+ }
+ </style>
+<![endif]-->
+
+
+
+<link rel="icon" type="image/x-icon" href="/assets/images/favicon.ico">
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+<script src="/assets/js/jquery.min.js"></script>
+
+
+<script src="/assets/js/main.min.js"></script>
+
+ </head>
+
+ <body class="layout--single">
+ <!--[if lt IE 9]>
+<div class="notice--danger align-center" style="margin: 0;">You are using an
<strong>outdated</strong> browser. Please <a
href="https://browsehappy.com/">upgrade your browser</a> to improve your
experience.</div>
+<![endif]-->
+
+ <div class="masthead">
+ <div class="masthead__inner-wrap" id="masthead__inner-wrap">
+ <div class="masthead__menu">
+ <nav id="site-nav" class="greedy-nav">
+
+ <a class="site-logo" href="/">
+ <div style="width: 150px; height: 40px">
+ </div>
+ </a>
+
+ <a class="site-title" href="/">
+
+ </a>
+ <ul class="visible-links"><li class="masthead__menu-item">
+ <a href="/docs/quick-start-guide.html" target="_self"
>Documentation</a>
+ </li><li class="masthead__menu-item">
+ <a href="/community.html" target="_self" >Community</a>
+ </li><li class="masthead__menu-item">
+ <a href="/blog.html" target="_self" >Blog</a>
+ </li><li class="masthead__menu-item">
+ <a href="https://cwiki.apache.org/confluence/display/HUDI/FAQ"
target="_blank" >FAQ</a>
+ </li><li class="masthead__menu-item">
+ <a href="/releases.html" target="_self" >Releases</a>
+ </li></ul>
+ <button class="greedy-nav__toggle hidden" type="button">
+ <span class="visually-hidden">Toggle menu</span>
+ <div class="navicon"></div>
+ </button>
+ <ul class="hidden-links hidden"></ul>
+ </nav>
+ </div>
+ </div>
+</div>
+<!--
+<p class="notice--warning" style="margin: 0 !important; text-align: center
!important;"><strong>Note:</strong> This site is work in progress, if you
notice any issues, please <a target="_blank"
href="https://github.com/apache/hudi/issues">Report on Issue</a>.
+ Click <a href="/"> here</a> back to old site.</p>
+-->
+
+ <div class="initial-content">
+ <div id="main" role="main">
+
+
+ <div class="sidebar sticky">
+
+
+ <div itemscope itemtype="https://schema.org/Person">
+
+ <div class="author__content">
+
+ <h3 class="author__name" itemprop="name">Quick Links</h3>
+
+
+ <div class="author__bio" itemprop="description">
+ <p>Hudi <em>ingests</em> & <em>manages</em> storage of large
analytical datasets over DFS.</p>
+
+ </div>
+
+ </div>
+
+ <div class="author__urls-wrapper">
+ <ul class="author__urls social-icons">
+
+
+ <li><a href="/docs/quick-start-guide" target="_self" rel="nofollow
noopener noreferrer"><i class="fa fa-book" aria-hidden="true"></i>
Documentation</a></li>
+
+
+
+ <li><a href="https://cwiki.apache.org/confluence/display/HUDI"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-wikipedia-w"
aria-hidden="true"></i> Technical Wiki</a></li>
+
+
+
+ <li><a href="/contributing" target="_self" rel="nofollow noopener
noreferrer"><i class="fa fa-thumbs-o-up" aria-hidden="true"></i> Contribution
Guide</a></li>
+
+
+
+ <li><a
href="https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-slack"
aria-hidden="true"></i> Join on Slack</a></li>
+
+
+
+ <li><a href="https://github.com/apache/hudi" target="_blank"
rel="nofollow noopener noreferrer"><i class="fa fa-github"
aria-hidden="true"></i> Fork on GitHub</a></li>
+
+
+
+ <li><a href="https://issues.apache.org/jira/projects/HUDI/summary"
target="_blank" rel="nofollow noopener noreferrer"><i class="fa fa-navicon"
aria-hidden="true"></i> Report Issues</a></li>
+
+
+
+ <li><a href="/security" target="_self" rel="nofollow noopener
noreferrer"><i class="fa fa-navicon" aria-hidden="true"></i> Report Security
Issues</a></li>
+
+
+
+
+ </ul>
+ </div>
+</div>
+
+
+
+
+ </div>
+
+
+ <article class="page" itemscope itemtype="https://schema.org/CreativeWork">
+ <!-- Look the author details up from the site config. -->
+
+
+ <div class="page__inner-wrap">
+
+ <header>
+ <h1 id="page-title" class="page__title" itemprop="headline">Building
High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go
+</h1>
+ <!-- Output author details if some exist. -->
+ <div class="page__author"><a href="https://www.t3go.cn/">Trevor
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1,
2020</time></span>
+ </header>
+
+
+ <section class="page__content" itemprop="text">
+
+ <style>
+ .page {
+ padding-right: 0 !important;
+ }
+ </style>
+
+ <h1
id="building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go">Building
High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go</h1>
+<p><a href="https://www.t3go.cn/">T3Go</a> is China’s first platform for
smart travel based on the Internet of Vehicles. In this article, Trevor Zhang
and Vino Yang from T3Go describe the evolution of their data lake architecture,
built on cloud-native or open-source technologies including Alibaba OSS, Apache
Hudi, and Alluxio. Today, their data lake stores petabytes of data, supporting
hundreds of pipelines and tens of thousands of tasks daily. It is essential for
business units at T3G [...]
+
+<p>In this blog, you will see how we slashed data ingestion time by half using
Hudi and Alluxio. Furthermore, data analysts using Presto, Hudi, and Alluxio
saw the queries speed up by 10 times. We built our data lake based on data
orchestration for multiple stages of our data pipeline, including ingestion and
analytics.</p>
+
+<h1 id="i-t3go-data-lake-overview">I. T3Go data lake Overview</h1>
+
+<p>Prior to the data lake, different business units within T3Go managed their
own data processing solutions, utilizing different storage systems, ETL tools,
and data processing frameworks. Data for each became siloed from every other
unit, significantly increasing cost and complexity. Due to the rapid business
expansion of T3Go, this inefficiency became our engineering bottleneck.</p>
+
+<p>We moved to a unified data lake solution based on Alibaba OSS, an object
store similar to AWS S3, to provide a centralized location to store structured
and unstructured data, following the design principles of <em>Multi-cluster
Shared-data Architecture</em>; all the applications access OSS storage as the
source of truth, as opposed to different data silos. This architecture allows
us to store the data as-is, without having to first structure the data, and run
different types of analy [...]
+
+<h1 id="ii-efficient-near-real-time-analytics-using-hudi">II. Efficient Near
Real-time Analytics Using Hudi</h1>
+
+<p>Our business in smart travel drives the need to process and analyze data in
a near real-time manner. With a traditional data warehouse, we faced the
following challenges:</p>
+
+<ol>
+ <li>High overhead when updating due to long-tail latency</li>
+ <li>High cost of order analysis due to the long window of a business
session</li>
+ <li>Reduced query accuracy due to late or ad-hoc updates</li>
+ <li>Unreliability in data ingestion pipeline</li>
+ <li>Data lost in the distributed data pipeline that cannot be reconciled</li>
+ <li>High latency to access data storage</li>
+</ol>
+
+<p>As a result, we adopted Apache Hudi on top of OSS to address these issues.
The following diagram outlines the architecture:</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-architecture.png"
alt="architecture" /></p>
+
+<h2 id="enable-near-real-time-data-ingestion-and-analysis">Enable Near real
time data ingestion and analysis</h2>
+
+<p>With Hudi, our data lake supports multiple data sources including Kafka,
MySQL binlog, GIS, and other business logs in near real time. As a result, more
than 60% of the company’s data is stored in the data lake and this proportion
continues to increase.</p>
+
+<p>We are also able to speed up the data ingestion time down to a few minutes
by introducing Apache Hudi into the data pipeline. Combined with big data
interactive query and analysis framework such as Presto and SparkSQL, real-time
data analysis and insights are achieved.</p>
+
+<h2 id="enable-incremental-processing-pipeline">Enable Incremental processing
pipeline</h2>
+
+<p>With the help of Hudi, it is possible to provide incremental changes to the
downstream derived table when the upstream table updates frequently. Even with
a large number of interdependent tables, we can quickly run partial data
updates. This also effectively avoids updating the full partitions of cold
tables in the traditional Hive data warehouse.</p>
+
+<h2 id="accessing-data-using-hudi-as-a-unified-format">Accessing Data using
Hudi as a unified format</h2>
+
+<p>Traditional data warehouses often deploy Hadoop to store data and provide
batch analysis. Kafka is used separately to distribute Hadoop data to other
data processing frameworks, resulting in duplicated data. Hudi helps
effectively solve this problem; we always use Spark pipelines to insert new
updates into the Hudi tables, then incrementally read the update of Hudi
tables. In other words, Hudi tables are used as the unified storage format to
access data.</p>
+
+<h1 id="iii-efficient-data-caching-using-alluxio">III. Efficient Data Caching
Using Alluxio</h1>
+
+<p>In the early version of our data lake without Alluxio, data received from
Kafka in real time is processed by Spark and then written to OSS data lake
using Hudi DeltaStreamer tasks. With this architecture, Spark often suffered
high network latency when writing to OSS directly. Since all data is in OSS
storage, OLAP queries on Hudi data may also be slow due to lack of data
locality.</p>
+
+<p>To address the latency issue, we deployed Alluxio as a data orchestration
layer, co-located with computing engines such as Spark and Presto, and used
Alluxio to accelerate read and write on the data lake as shown in the following
diagram:</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-architecture-alluxio.png"
alt="architecture-alluxio" /></p>
+
+<p>Data in formats such as Hudi, Parquet, ORC, and JSON are stored mostly on
OSS, consisting of 95% of the data. Computing engines such as Flink, Spark,
Kylin, and Presto are deployed in isolated clusters respectively. When each
engine accesses OSS, Alluxio acts as a virtual distributed storage system to
accelerate data, being co-located with each of the computing clusters.</p>
+
+<p>Specifically, here are a few applications leveraging Alluxio in the T3Go
data lake.</p>
+
+<h2 id="data-lake-ingestion">Data lake ingestion</h2>
+
+<p>We mount the corresponding OSS path to the Alluxio file system and set
Hudi’s <em>“<strong>target-base-path</strong>”</em> parameter value to use
the alluxio:// scheme in place of oss:// scheme. Spark pipelines with Hudi
continuously ingest data to Alluxio. After data is written to Alluxio, it is
asynchronously persisted from the Alluxio cache to the remote OSS every minute.
These modifications allow Spark to write to a local Alluxio node instead of
writing to remote OSS, significan [...]
+
+<h2 id="data-analysis-on-the-lake">Data analysis on the lake</h2>
+
+<p>We use Presto as an ad-hoc query engine to analyze the Hudi tables in the
lake, co-locating Alluxio workers on each Presto worker node. When Presto and
Alluxio services are co-located and running, Alluxio caches the input data
locally in the Presto worker which greatly benefits Presto for subsequent
retrievals. On a cache hit, Presto can read from the local Alluxio worker
storage at memory speed without any additional data transfer over the
network.</p>
+
+<h2 id="concurrent-accesses-across-multiple-storage-systems">Concurrent
accesses across multiple storage systems</h2>
+
+<p>In order to ensure the accuracy of training samples, our machine learning
team often synchronizes desensitized data in production to an offline machine
learning environment. During synchronization, the data flows across multiple
file systems, from production OSS to an offline HDFS followed by another
offline Machine Learning HDFS.</p>
+
+<p>This data migration process is not only inefficient but also error-prune
for modelers because multiple different storages with varying configurations
are involved. Alluxio helps in this specific scenario by mounting the
destination storage systems under the same filesystem to be accessed by their
corresponding logical paths in Alluxio namespace. By decoupling the physical
storage, this allows applications with different APIs to access and transfer
data seamlessly. This data access lay [...]
+
+<h2 id="microbenchmark">Microbenchmark</h2>
+
+<p>Overall, we observed the following improvements with Alluxio:</p>
+
+<ol>
+ <li>It supports a hierarchical and transparent caching mechanism</li>
+ <li>It supports cache promote omode mode when reading</li>
+ <li>It supports asynchronous writing mode</li>
+ <li>It supports LRU recycling strategy</li>
+ <li>It has pin and TTL features</li>
+</ol>
+
+<p>After comparison and verification, we choose to use Spark SQL as the query
engine. Our performance testing queries the Hudi table, comparing Alluxio + OSS
together against OSS directly as well as HDFS.</p>
+
+<p><img src="/assets/images/blog/2020-12-01-t3go-microbenchmark.png"
alt="microbench" /></p>
+
+<p>In the stress test shown above, after the data volume is greater than a
certain magnitude (2400W), the query speed using Alluxio+OSS surpasses the HDFS
query speed of the hybrid deployment. After the data volume is greater than 1E,
the query speed starts to double. After reaching 6E data, it is up to 12 times
higher than querying native OSS and 8 times higher than querying native HDFS.
The improvement depends on the machine configuration.</p>
+
+<p>Based on our performance benchmarking, we found that the performance can be
improved by over 10 times with the help of Alluxio. Furthermore, the larger the
data scale, the more prominent the performance improvement.</p>
+
+<h1 id="iv-next-step">IV. Next Step</h1>
+
+<p>As T3Go’s data lake ecosystem expands, we will continue facing the critical
scenario of compute and storage segregation. With T3Go’s growing data
processing needs, our team plans to deploy Alluxio on a larger scale to
accelerate our data lake storage.</p>
+
+<p>In addition to the deployment of Alluxio on the data lake computing engine,
which currently is mainly SparkSQL, we plan to add a layer of Alluxio to the
OLAP cluster using Apache Kylin and an ad_hoc cluster using Presto. The goal is
to have Alluxio cover all computing scenarios, with Alluxio interconnected
between each scene to improve the read and write efficiency of the data lake
and the surrounding lake ecology.</p>
+
+<h1 id="v-conclusion">V. Conclusion</h1>
+
+<p>As mentioned earlier, Hudi and Alluxio covers all scenarios of Hudi’s near
real-time ingestion, near real-time analysis, incremental processing, and data
distribution on DFS, among many others, and plays the role of a powerful
accelerator on data ingestion and data analysis on the lake. With Hudi and
Alluxio together, <strong>our R&D engineers shortened the time for data
ingestion into the lake by up to a factor of 2. Data analysts using Presto,
Hudi, and Alluxio in conjunction t [...]
+
+ </section>
+
+ <a href="#masthead__inner-wrap" class="back-to-top">Back to top
↑</a>
+
+
+
+
+ </div>
+
+ </article>
+
+</div>
+
+ </div>
+
+ <div class="page__footer">
+ <footer>
+
+<div class="row">
+ <div class="col-lg-12 footer">
+ <p>
+ <table class="table-apache-info">
+ <tr>
+ <td>
+ <a class="footer-link-img" href="https://apache.org">
+ <img width="250px" src="/assets/images/asf_logo.svg" alt="The
Apache Software Foundation">
+ </a>
+ </td>
+ <td>
+ <a style="float: right"
href="https://www.apache.org/events/current-event.html">
+ <img
src="https://www.apache.org/events/current-event-234x60.png" />
+ </a>
+ </td>
+ </tr>
+ </table>
+ </p>
+ <p>
+ <a href="https://www.apache.org/licenses/">License</a> | <a
href="https://www.apache.org/security/">Security</a> | <a
href="https://www.apache.org/foundation/thanks.html">Thanks</a> | <a
href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
+ </p>
+ <p>
+ Copyright © <span id="copyright-year">2019</span> <a
href="https://apache.org">The Apache Software Foundation</a>, Licensed under
the <a href="https://www.apache.org/licenses/LICENSE-2.0"> Apache License,
Version 2.0</a>.
+ Hudi, Apache and the Apache feather logo are trademarks of The Apache
Software Foundation. <a href="/docs/privacy">Privacy Policy</a>
+ </p>
+ </div>
+</div>
+ </footer>
+ </div>
+
+
+ </body>
+</html>
\ No newline at end of file
diff --git a/content/cn/activity.html b/content/cn/activity.html
index 222d170..c87c2fe 100644
--- a/content/cn/activity.html
+++ b/content/cn/activity.html
@@ -191,6 +191,30 @@
<h2 class="archive__item-title" itemprop="headline">
+ <a href="/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/"
rel="permalink">Building High-Performance Data Lake Using Apache Hudi and
Alluxio at T3Go
+</a>
+
+ </h2>
+ <!-- Look the author details up from the site config. -->
+
+ <!-- Output author details if some exist. -->
+ <div class="archive__item-meta"><a href="https://www.t3go.cn/">Trevor
Zhang, Vino Yang</a> posted on <time datetime="2020-12-01">December 1,
2020</time></div>
+
+ <p class="archive__item-excerpt" itemprop="description">How T3Go’s
high-performance data lake using Apache Hudi and Alluxio shortened the time for
data ingestion into the lake by up to a factor of 2. Data analysts using
Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries
speed up by 10 times faster.
+</p>
+ </article>
+</div>
+
+
+
+
+
+
+<div class="list__item">
+ <article class="archive__item" itemscope
itemtype="https://schema.org/CreativeWork">
+
+ <h2 class="archive__item-title" itemprop="headline">
+
<a href="/blog/hudi-meets-aws-emr-and-aws-dms/" rel="permalink">Apply
record level changes from relational databases to Amazon S3 data lake using
Apache Hudi on Amazon EMR and AWS Database Migration Service
</a>
diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html
index 04e64dd..df48680 100644
--- a/content/docs/powered_by.html
+++ b/content/docs/powered_by.html
@@ -486,17 +486,27 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA</p>
<p><a href="https://youtu.be/nA3rwOdmm3A">“PrestoDB and Apache Hudi”</a> -
By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB
Community Meetup.</p>
</li>
<li>
+ <p><a href="https://www.youtube.com/watch?v=hNxrsjhI-9w">“DC_THURS :
Apache Hudi w/ Nishith Agarwal & Vinoth Chandar”</a>, Aug 2020, Online
discussion/Q&A with DataCouncil Founder</p>
+ </li>
+ <li>
<p><a href="https://www.youtube.com/watch?v=lsFSM2Z4kPs">“Panel Discussion
on Presto Ecosystem”</a> - By Vinoth Chandar, Sep 2020, PrestoCon <a
href="https://prestocon2020.sched.com/event/dgyw">“panel”</a>.</p>
</li>
<li>
<p><a
href="https://docs.google.com/presentation/d/1y-ryRwCdTbqQHGr_bn3lxM_B8L1L5nsZOIXlJsDl_wU/edit?usp=sharing">“Next
Generation Data lakes using Apache Hudi”</a> - By Balaji Varadarajan and
Sivabalan Narayanan, Sep 2020, <a
href="https://www.apachecon.com/">“ApacheCon”</a></p>
</li>
<li>
+ <p><a
href="https://www.dbta.com/DataSummit/Fall2020/Agenda.aspx">“Building
Large-Scale, Transactional Data Lakes using Apache Hudi”</a> - By Nishith
Agarwal, Data Summit 2020</p>
+ </li>
+ <li>
<p><a
href="https://drive.google.com/file/d/1ULVPkjynaw-07wsutLcZm-4rVXf8E8N8/view?usp=sharing">“Landing
practice of Apache Hudi in T3go”</a> - By VinoYang and XianghuWang, November
2020, Qcon.</p>
- <h2 id="articles">Articles</h2>
+ </li>
+ <li>
+ <p><a href="https://www.meetup.com/UberEvents/events/274924537/">“Meetup
talk by Nishith Agarwal”</a> - Uber Data Platforms Meetup, Dec 2020</p>
</li>
</ol>
+<h2 id="articles">Articles</h2>
+
<p>You can check out <a href="https://hudi.apache.org/blog.html">our blog
pages</a> for content written by our committers/contributors.</p>
<ol>
@@ -512,6 +522,10 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA</p>
<li><a
href="https://towardsdatascience.com/data-lake-change-data-capture-cdc-using-apache-hudi-on-amazon-emr-part-2-process-65e4662d7b4b">“Data
Lake Change Capture using Apache Hudi & Amazon AMS/EMR”</a> - Towards
DataScience article, Oct 20</li>
<li><a
href="https://aws.amazon.com/blogs/apn/how-nclouds-helps-accelerate-data-delivery-with-apache-hudi-on-amazon-emr/">“How
nClouds Helps Accelerate Data Delivery with Apache Hudi on Amazon EMR”</a> -
published by nClouds in partnership with AWS</li>
<li><a
href="https://aws.amazon.com/blogs/big-data/apply-record-level-changes-from-relational-databases-to-amazon-s3-data-lake-using-apache-hudi-on-amazon-emr-and-aws-database-migration-service/">“Apply
record level changes from relational databases to Amazon S3 data lake using
Apache Hudi on Amazon EMR and AWS Database Migration Service”</a> - AWS
blog</li>
+ <li><a
href="https://www.dbta.com/Editorial/News-Flashes/Architecting-Data-Lakes-for-the-Modern-Enterprise-at-Data-Summit-Connect-Fall-2020-143512.aspx">“Architecting
Data Lakes for the Modern Enterprise at Data Summit Connect Fall 2020”</a></li>
+ <li><a
href="https://www.analyticsinsight.net/can-big-data-solutions-be-affordable/">“Can
Big Data Solutions Be Affordable?”</a></li>
+ <li><a
href="https://www.alluxio.io/blog/building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go/">“Building
High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go”</a></li>
+ <li><a
href="https://towardsdatascience.com/data-lake-change-data-capture-cdc-using-apache-hudi-on-amazon-emr-part-2-process-65e4662d7b4b">“Data
Lake Change Capture using Apache Hudi & Amazon AMS/EMR Part 2”</a></li>
</ol>
<h2 id="powered-by">Powered by</h2>
diff --git a/content/sitemap.xml b/content/sitemap.xml
index db44ee0..432cf35 100644
--- a/content/sitemap.xml
+++ b/content/sitemap.xml
@@ -965,6 +965,10 @@
<lastmod>2020-10-19T00:00:00-04:00</lastmod>
</url>
<url>
+<loc>https://hudi.apache.org/blog/high-perf-data-lake-with-hudi-and-alluxio-t3go/</loc>
+<lastmod>2020-12-01T00:00:00-05:00</lastmod>
+</url>
+<url>
<loc>https://hudi.apache.org/cn/activity</loc>
<lastmod>2019-12-30T14:59:57-05:00</lastmod>
</url>