This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a799502 Travis CI build asf-site
a799502 is described below
commit a799502bb40ce5a2bdbd36646d39ff7a244e1bf2
Author: CI <[email protected]>
AuthorDate: Sat Dec 19 09:29:05 2020 +0000
Travis CI build asf-site
---
content/activity.html | 2 +-
content/assets/js/lunr/lunr-store.js | 4 ++--
content/blog.html | 2 +-
content/blog/hudi-indexing-mechanisms/index.html | 14 +++++++-------
content/cn/activity.html | 2 +-
5 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/content/activity.html b/content/activity.html
index bda641f..fd16252 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -215,7 +215,7 @@
<h2 class="archive__item-title" itemprop="headline">
- <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes
+ <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes in Apache Hudi
</a>
</h2>
diff --git a/content/assets/js/lunr/lunr-store.js
b/content/assets/js/lunr/lunr-store.js
index 511e4bf..970d840 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1204,8 +1204,8 @@ var store = [{
"tags": [],
"url": "https://hudi.apache.org/blog/hudi-meets-aws-emr-and-aws-dms/",
"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
- "title": "Employing the right indexes for fast updates, deletes",
- "excerpt":"Apache Hudi employs an index to locate the file group, that
an update/delete belong to. For Copy-On-Write tables, this enables fast
upsert/delete operations, by avoiding the need to join against the entire
dataset to determine which files to rewrite. For Merge-On-Read tables, this
design allows Hudi to bound the amount...","categories": ["blog"],
+ "title": "Employing the right indexes for fast updates, deletes in
Apache Hudi",
+ "excerpt":"Apache Hudi employs an index to locate the file group, that
an update/delete belongs to. For Copy-On-Write tables, this enables fast
upsert/delete operations, by avoiding the need to join against the entire
dataset to determine which files to rewrite. For Merge-On-Read tables, this
design allows Hudi to bound the amount...","categories": ["blog"],
"tags": [],
"url": "https://hudi.apache.org/blog/hudi-indexing-mechanisms/",
"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
diff --git a/content/blog.html b/content/blog.html
index 943c988..97d3a6f 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -213,7 +213,7 @@
<h2 class="archive__item-title" itemprop="headline">
- <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes
+ <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes in Apache Hudi
</a>
</h2>
diff --git a/content/blog/hudi-indexing-mechanisms/index.html
b/content/blog/hudi-indexing-mechanisms/index.html
index fa47738..309a0ee 100644
--- a/content/blog/hudi-indexing-mechanisms/index.html
+++ b/content/blog/hudi-indexing-mechanisms/index.html
@@ -3,13 +3,13 @@
<head>
<meta charset="utf-8">
-<!-- begin _includes/seo.html --><title>Employing the right indexes for fast
updates, deletes - Apache Hudi</title>
+<!-- begin _includes/seo.html --><title>Employing the right indexes for fast
updates, deletes in Apache Hudi - Apache Hudi</title>
<meta name="description" content="Detailing different indexing mechanisms in
Hudi and when to use each of them">
<meta property="og:type" content="article">
<meta property="og:locale" content="en_US">
<meta property="og:site_name" content="">
-<meta property="og:title" content="Employing the right indexes for fast
updates, deletes">
+<meta property="og:title" content="Employing the right indexes for fast
updates, deletes in Apache Hudi">
<meta property="og:url"
content="https://hudi.apache.org/blog/hudi-indexing-mechanisms/">
@@ -180,7 +180,7 @@
<div class="page__inner-wrap">
<header>
- <h1 id="page-title" class="page__title"
itemprop="headline">Employing the right indexes for fast updates, deletes
+ <h1 id="page-title" class="page__title"
itemprop="headline">Employing the right indexes for fast updates, deletes in
Apache Hudi
</h1>
<!-- Output author details if some exist. -->
@@ -195,11 +195,11 @@
}
</style>
- <p>Apache Hudi employs an index to locate the file group, that an
update/delete belong to. For Copy-On-Write tables, this enables
+ <p>Apache Hudi employs an index to locate the file group, that an
update/delete belongs to. For Copy-On-Write tables, this enables
fast upsert/delete operations, by avoiding the need to join against the entire
dataset to determine which files to rewrite.
For Merge-On-Read tables, this design allows Hudi to bound the amount of
records any given base file needs to be merged against.
Specifically, a given base file needs to merged only against updates for
records that are part of that base file. In contrast,
-designs without an indexing component like <a
href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions">Apache
Hive ACID</a>,
+designs without an indexing component (e.g: <a
href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions">Apache
Hive ACID</a>),
could end up having to merge all the base files against all incoming
updates/delete records.</p>
<p>At a high level, an index maps a record key + an optional partition path to
a file group ID on storage (explained
@@ -270,7 +270,7 @@ configured false positive ratio.</p>
point lookups. This would avoid any current limitations around reading bloom
filters/ranges from the base files themselves, to perform the lookup. (see
<a
href="https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements?src=contextnavpagetreemode">RFC-15</a>
for the general design)</p>
-<h2 id="workload-duplicated-records-in-event-tables">Workload: Duplicated
records in event tables</h2>
+<h2 id="workload-de-duplication-in-event-tables">Workload: De-Duplication in
event tables</h2>
<p>Event Streaming is everywhere. Events coming from Apache Kafka or similar
message bus are typically 10-100x the size of fact tables and often treat
“time” (event’s arrival time/processing
time) as a first class citizen. For eg, IoT event stream, click stream data,
ad impressions etc. Inserts and updates only span the last few partitions as
these are mostly append only data.
@@ -284,7 +284,7 @@ costs would grow linear with number of events and thus can
be prohibitively expe
that time is often a first class citizen and construct a key such as <code
class="highlighter-rouge">event_ts + event_id</code> such that the inserted
records have monotonically increasing keys. This yields great returns
by pruning large amounts of files even within the latest table partitions.</p>
-<h2
id="workload-completely-random-updatesdeletes-to-a-dimension-table">Workload:
Completely random updates/deletes to a dimension table</h2>
+<h2 id="workload-random-updatesdeletes-to-a-dimension-table">Workload: Random
updates/deletes to a dimension table</h2>
<p>These types of tables usually contain high dimensional data and hold
reference data e.g user profile, merchant information. These are high fidelity
tables where the updates are often small but also spread
across a lot of partitions and data files ranging across the dataset from old
to new. Often times, these tables are also un-partitioned, since there is also
not a good way to partition these tables.</p>
diff --git a/content/cn/activity.html b/content/cn/activity.html
index b46a04b..f1473db 100644
--- a/content/cn/activity.html
+++ b/content/cn/activity.html
@@ -215,7 +215,7 @@
<h2 class="archive__item-title" itemprop="headline">
- <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes
+ <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing
the right indexes for fast updates, deletes in Apache Hudi
</a>
</h2>