This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new a799502  Travis CI build asf-site
a799502 is described below

commit a799502bb40ce5a2bdbd36646d39ff7a244e1bf2
Author: CI <[email protected]>
AuthorDate: Sat Dec 19 09:29:05 2020 +0000

    Travis CI build asf-site
---
 content/activity.html                            |  2 +-
 content/assets/js/lunr/lunr-store.js             |  4 ++--
 content/blog.html                                |  2 +-
 content/blog/hudi-indexing-mechanisms/index.html | 14 +++++++-------
 content/cn/activity.html                         |  2 +-
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/content/activity.html b/content/activity.html
index bda641f..fd16252 100644
--- a/content/activity.html
+++ b/content/activity.html
@@ -215,7 +215,7 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
-        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes
+        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes in Apache Hudi
 </a>
       
     </h2>
diff --git a/content/assets/js/lunr/lunr-store.js 
b/content/assets/js/lunr/lunr-store.js
index 511e4bf..970d840 100644
--- a/content/assets/js/lunr/lunr-store.js
+++ b/content/assets/js/lunr/lunr-store.js
@@ -1204,8 +1204,8 @@ var store = [{
         "tags": [],
         "url": "https://hudi.apache.org/blog/hudi-meets-aws-emr-and-aws-dms/";,
         "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
-        "title": "Employing the right indexes for fast updates, deletes",
-        "excerpt":"Apache Hudi employs an index to locate the file group, that 
an update/delete belong to. For Copy-On-Write tables, this enables fast 
upsert/delete operations, by avoiding the need to join against the entire 
dataset to determine which files to rewrite. For Merge-On-Read tables, this 
design allows Hudi to bound the amount...","categories": ["blog"],
+        "title": "Employing the right indexes for fast updates, deletes in 
Apache Hudi",
+        "excerpt":"Apache Hudi employs an index to locate the file group, that 
an update/delete belongs to. For Copy-On-Write tables, this enables fast 
upsert/delete operations, by avoiding the need to join against the entire 
dataset to determine which files to rewrite. For Merge-On-Read tables, this 
design allows Hudi to bound the amount...","categories": ["blog"],
         "tags": [],
         "url": "https://hudi.apache.org/blog/hudi-indexing-mechanisms/";,
         "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{
diff --git a/content/blog.html b/content/blog.html
index 943c988..97d3a6f 100644
--- a/content/blog.html
+++ b/content/blog.html
@@ -213,7 +213,7 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
-        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes
+        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes in Apache Hudi
 </a>
       
     </h2>
diff --git a/content/blog/hudi-indexing-mechanisms/index.html 
b/content/blog/hudi-indexing-mechanisms/index.html
index fa47738..309a0ee 100644
--- a/content/blog/hudi-indexing-mechanisms/index.html
+++ b/content/blog/hudi-indexing-mechanisms/index.html
@@ -3,13 +3,13 @@
   <head>
     <meta charset="utf-8">
 
-<!-- begin _includes/seo.html --><title>Employing the right indexes for fast 
updates, deletes - Apache Hudi</title>
+<!-- begin _includes/seo.html --><title>Employing the right indexes for fast 
updates, deletes in Apache Hudi - Apache Hudi</title>
 <meta name="description" content="Detailing different indexing mechanisms in 
Hudi and when to use each of them">
 
 <meta property="og:type" content="article">
 <meta property="og:locale" content="en_US">
 <meta property="og:site_name" content="">
-<meta property="og:title" content="Employing the right indexes for fast 
updates, deletes">
+<meta property="og:title" content="Employing the right indexes for fast 
updates, deletes in Apache Hudi">
 <meta property="og:url" 
content="https://hudi.apache.org/blog/hudi-indexing-mechanisms/";>
 
 
@@ -180,7 +180,7 @@
     <div class="page__inner-wrap">
       
         <header>
-          <h1 id="page-title" class="page__title" 
itemprop="headline">Employing the right indexes for fast updates, deletes
+          <h1 id="page-title" class="page__title" 
itemprop="headline">Employing the right indexes for fast updates, deletes in 
Apache Hudi
 </h1>
           <!-- Output author details if some exist. -->
           
@@ -195,11 +195,11 @@
             }
           </style>
         
-        <p>Apache Hudi employs an index to locate the file group, that an 
update/delete belong to. For Copy-On-Write tables, this enables
+        <p>Apache Hudi employs an index to locate the file group, that an 
update/delete belongs to. For Copy-On-Write tables, this enables
 fast upsert/delete operations, by avoiding the need to join against the entire 
dataset to determine which files to rewrite.
 For Merge-On-Read tables, this design allows Hudi to bound the amount of 
records any given base file needs to be merged against.
 Specifically, a given base file needs to merged only against updates for 
records that are part of that base file. In contrast,
-designs without an indexing component like <a 
href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions";>Apache
 Hive ACID</a>,
+designs without an indexing component (e.g: <a 
href="https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions";>Apache
 Hive ACID</a>),
 could end up having to merge all the base files against all incoming 
updates/delete records.</p>
 
 <p>At a high level, an index maps a record key + an optional partition path to 
a file group ID on storage (explained
@@ -270,7 +270,7 @@ configured false positive ratio.</p>
 point lookups. This would avoid any current limitations around reading bloom 
filters/ranges from the base files themselves, to perform the lookup. (see 
 <a 
href="https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements?src=contextnavpagetreemode";>RFC-15</a>
 for the general design)</p>
 
-<h2 id="workload-duplicated-records-in-event-tables">Workload: Duplicated 
records in event tables</h2>
+<h2 id="workload-de-duplication-in-event-tables">Workload: De-Duplication in 
event tables</h2>
 
 <p>Event Streaming is everywhere. Events coming from Apache Kafka or similar 
message bus are typically 10-100x the size of fact tables and often treat 
“time” (event’s arrival time/processing 
 time) as a first class citizen. For eg, IoT event stream, click stream data, 
ad impressions etc. Inserts and updates only span the last few partitions as 
these are mostly append only data. 
@@ -284,7 +284,7 @@ costs would grow linear with number of events and thus can 
be prohibitively expe
 that time is often a first class citizen and construct a key such as <code 
class="highlighter-rouge">event_ts + event_id</code> such that the inserted 
records have monotonically increasing keys. This yields great returns
 by pruning large amounts of files even within the latest table partitions.</p>
 
-<h2 
id="workload-completely-random-updatesdeletes-to-a-dimension-table">Workload: 
Completely random updates/deletes to a dimension table</h2>
+<h2 id="workload-random-updatesdeletes-to-a-dimension-table">Workload: Random 
updates/deletes to a dimension table</h2>
 
 <p>These types of tables usually contain high dimensional data and hold 
reference data e.g user profile, merchant information. These are high fidelity 
tables where the updates are often small but also spread 
 across a lot of partitions and data files ranging across the dataset from old 
to new. Often times, these tables are also un-partitioned, since there is also 
not a good way to partition these tables.</p>
diff --git a/content/cn/activity.html b/content/cn/activity.html
index b46a04b..f1473db 100644
--- a/content/cn/activity.html
+++ b/content/cn/activity.html
@@ -215,7 +215,7 @@
     
     <h2 class="archive__item-title" itemprop="headline">
       
-        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes
+        <a href="/blog/hudi-indexing-mechanisms/" rel="permalink">Employing 
the right indexes for fast updates, deletes in Apache Hudi
 </a>
       
     </h2>

Reply via email to