[incubator-hudi] branch asf-site updated: [HUDI-862] Migrate HUDI site blogs from Confluence to Jeykll. (#1594)

vinoth Tue, 12 May 2020 21:05:18 -0700

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 06149f3  [HUDI-862] Migrate HUDI site blogs from Confluence to Jeykll. 
(#1594)
06149f3 is described below

commit 06149f3115136a9e65699b3d05e16a3b484af055
Author: Prashant Wason <[email protected]>
AuthorDate: Tue May 12 21:04:12 2020 -0700

    [HUDI-862] Migrate HUDI site blogs from Confluence to Jeykll. (#1594)
    
    1. Added a data file which saves author information (authors.yml)
    2. Theme changes:
      - Author names added to blog pages. Author names link to author bio page 
on HUDI Confluence Wiki.
      - Date added to blog posts
      - Blog archive page has excerpts for better layout.
      - Added custom CSS for the above theme changes.
    3. Added a new asset directory where the blog images are stored 
(docs/assets/images/blog)
    4. Exported all blog posts from Confluence in markdown format.
    5. Adding syntax highlighting to the blog posts
    
    Co-authored-by: Vinoth Chandar <[email protected]>
    Co-authored-by: lamber-ken <[email protected]>
---
 docs/_data/authors.yml                             |  22 +++
 docs/_data/navigation.yml                          |   6 +-
 docs/_includes/archive-single.html                 |   7 +-
 docs/_layouts/single.html                          |   4 +
 docs/_pages/blog.md                                |   7 +
 docs/_posts/2016-12-30-strata-talk-2017.md         |   4 +-
 docs/_posts/2019-01-18-asf-incubation.md           |   4 +-
 docs/_posts/2019-03-07-batch-vs-incremental.md     |   8 +
 .../2019-05-14-registering-dataset-to-hive.md      |  85 +++++++++
 .../2019-09-09-ingesting-database-changes.md       |  45 +++++
 docs/_posts/2020-01-15-delete-support-in-hudi.md   | 189 +++++++++++++++++++
 docs/_posts/2020-01-20-change-capture-using-aws.md | 202 +++++++++++++++++++++
 docs/_posts/2020-03-22-exporting-hudi-datasets.md  | 102 +++++++++++
 .../2020-04-27-apache-hudi-apache-zepplin.md       |  65 +++++++
 docs/_sass/hudi_style/_archive.scss                |   8 +-
 docs/_sass/hudi_style/_page.scss                   |   6 +
 docs/assets/images/blog/batch_vs_incremental.png   | Bin 0 -> 22104 bytes
 .../images/blog/change-capture-architecture.png    | Bin 0 -> 16807 bytes
 docs/assets/images/blog/change-logs-mysql.png      | Bin 0 -> 114403 bytes
 docs/assets/images/blog/dms-demo-files.png         | Bin 0 -> 52683 bytes
 docs/assets/images/blog/dms-task.png               | Bin 0 -> 27532 bytes
 docs/assets/images/blog/read_optimized_view.png    | Bin 0 -> 134293 bytes
 docs/assets/images/blog/real_time_view.png         | Bin 0 -> 134366 bytes
 docs/assets/images/blog/s3-endoint-list.png        | Bin 0 -> 26945 bytes
 .../images/blog/s3-endpoint-configuration-1.png    | Bin 0 -> 38385 bytes
 .../images/blog/s3-endpoint-configuration-2.png    | Bin 0 -> 40796 bytes
 .../images/blog/s3-endpoint-configuration.png      | Bin 0 -> 38582 bytes
 docs/assets/images/blog/s3-migration-task-1.png    | Bin 0 -> 35653 bytes
 docs/assets/images/blog/s3-migration-task-2.png    | Bin 0 -> 43549 bytes
 docs/assets/images/blog/spark_edit_properties.png  | Bin 0 -> 9342 bytes
 .../images/blog/spark_read_optimized_view.png      | Bin 0 -> 38582 bytes
 docs/assets/images/blog/spark_real_time_view.png   | Bin 0 -> 38416 bytes
 32 files changed, 755 insertions(+), 9 deletions(-)

diff --git a/docs/_data/authors.yml b/docs/_data/authors.yml
new file mode 100644
index 0000000..90fd753
--- /dev/null
+++ b/docs/_data/authors.yml
@@ -0,0 +1,22 @@
+# Author details.
+
+admin:
+    name: Apache Hudi
+    web: https://hudi.apache.org
+
+vinoth:
+    name: Vinoth Chandar
+    web: https://cwiki.apache.org/confluence/display/~vinoth
+
+rxu:
+    name: Raymond Xu
+    web: https://cwiki.apache.org/confluence/display/~rxu
+
+shivnarayan:
+    name: Sivabalan Narayanan
+    web: https://cwiki.apache.org/confluence/display/~shivnarayan
+
+leesf:
+    name: Shaofeng Li
+    web: https://cwiki.apache.org/confluence/display/~leesf
+
diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
index 915af56..36bc591 100644
--- a/docs/_data/navigation.yml
+++ b/docs/_data/navigation.yml
@@ -5,8 +5,8 @@ main:
     url: /docs/quick-start-guide.html
   - title: "Community"
     url: /community.html
-  - title: "Activities"
-    url: /activity.html
+  - title: "Blog"
+    url: /blog.html
   - title: "FAQ"
     url: https://cwiki.apache.org/confluence/display/HUDI/FAQ
   - title: "Releases"
@@ -297,4 +297,4 @@ cn_docs:
       - title: "文档版本"
         url: /cn/docs/0.5.2-docs-versions.html
       - title: "版权信息"
-        url: /cn/docs/0.5.2-privacy.html
\ No newline at end of file
+        url: /cn/docs/0.5.2-privacy.html
diff --git a/docs/_includes/archive-single.html 
b/docs/_includes/archive-single.html
index 21ee9fb..5098f6c 100644
--- a/docs/_includes/archive-single.html
+++ b/docs/_includes/archive-single.html
@@ -30,6 +30,11 @@
         <a href="{{ post.url | relative_url }}" rel="permalink">{{ title }}</a>
       {% endif %}
     </h2>
-    {% if post.excerpt %}<p class="archive__item-excerpt" 
itemprop="description">{{ post.excerpt | markdownify | strip_html | truncate: 
160 }}</p>{% endif %}
+    <!-- Look the author details up from the site config. -->
+    {% assign author = site.data.authors[post.author] %}
+    <!-- Output author details if some exist. -->
+    {% if author %}<div class="archive__item-meta"><a href="{{ author.web 
}}">{{ author.name }}</a> posted on <time datetime="{{ post.date | date: 
"%Y-%m-%d" }}">{{ post.date | date: "%B %-d, %Y" }}</time></div>{% endif %}
+ 
+    {% if post.excerpt %}<p class="archive__item-excerpt" 
itemprop="description">{{ post.excerpt | markdownify | strip_html }}</p>{% 
endif %}
   </article>
 </div>
diff --git a/docs/_layouts/single.html b/docs/_layouts/single.html
index e78a9c9..9d1c2aa 100644
--- a/docs/_layouts/single.html
+++ b/docs/_layouts/single.html
@@ -6,11 +6,15 @@ layout: default
   {% include sidebar.html %}
 
   <article class="page" itemscope itemtype="https://schema.org/CreativeWork";>
+    <!-- Look the author details up from the site config. -->
+    {% assign author = site.data.authors[page.author] %}
 
     <div class="page__inner-wrap">
       {% unless page.header.overlay_color or page.header.overlay_image %}
         <header>
           {% if page.title %}<h1 id="page-title" class="page__title" 
itemprop="headline">{{ page.title | markdownify | remove: "<p>" | remove: 
"</p>" }}</h1>{% endif %}
+          <!-- Output author details if some exist. -->
+          {% if author %}<div class="page__author"><a href="{{ author.web 
}}">{{ author.name }}</a> posted on <time datetime="{{ page.date | date: 
"%Y-%m-%d" }}">{{ page.date | date: "%B %-d, %Y" }}</time></span>{% endif %}
         </header>
       {% endunless %}
 
diff --git a/docs/_pages/blog.md b/docs/_pages/blog.md
new file mode 100644
index 0000000..b55f4e9
--- /dev/null
+++ b/docs/_pages/blog.md
@@ -0,0 +1,7 @@
+---
+title: "Blog"
+permalink: /blog
+layout: posts
+author_profile: true
+---
+
diff --git a/docs/_posts/2016-12-30-strata-talk-2017.md 
b/docs/_posts/2016-12-30-strata-talk-2017.md
index e03b710..8405d6d 100644
--- a/docs/_posts/2016-12-30-strata-talk-2017.md
+++ b/docs/_posts/2016-12-30-strata-talk-2017.md
@@ -1,8 +1,8 @@
 ---
 title:  "Connect with us at Strata San Jose March 2017"
+author: admin
 date: 2016-12-30
-permalink: /strata.html
-link: /strata.html
+category: blog
 ---
 
 We will be presenting Hudi & general concepts around how incremental 
processing works at Uber.
diff --git a/docs/_posts/2019-01-18-asf-incubation.md 
b/docs/_posts/2019-01-18-asf-incubation.md
index ceca4d8..dde651e 100644
--- a/docs/_posts/2019-01-18-asf-incubation.md
+++ b/docs/_posts/2019-01-18-asf-incubation.md
@@ -1,8 +1,8 @@
 ---
 title: "Hudi entered Apache Incubator"
+author: admin
 date: 2019-01-18
-permalink: /asf.html
-link: /asf.html
+category: blog
 ---
 
 In the coming weeks, we will be moving in our new home on the Apache Incubator.
diff --git a/docs/_posts/2019-03-07-batch-vs-incremental.md 
b/docs/_posts/2019-03-07-batch-vs-incremental.md
new file mode 100644
index 0000000..2273227
--- /dev/null
+++ b/docs/_posts/2019-03-07-batch-vs-incremental.md
@@ -0,0 +1,8 @@
+---
+title: "Big Batch vs Incremental Processing"
+author: vinoth
+category: blog
+---
+
+![](/assets/images/blog/batch_vs_incremental.png)
+
diff --git a/docs/_posts/2019-05-14-registering-dataset-to-hive.md 
b/docs/_posts/2019-05-14-registering-dataset-to-hive.md
new file mode 100644
index 0000000..ec4946b
--- /dev/null
+++ b/docs/_posts/2019-05-14-registering-dataset-to-hive.md
@@ -0,0 +1,85 @@
+---
+title: "Registering sample dataset to Hive via beeline"
+excerpt: "How to manually register HUDI dataset into Hive using beeline"
+author: vinoth
+category: blog
+---
+
+Hudi hive sync tool typically handles registration of the dataset into Hive 
metastore. In case, there are issues with quickstart around this, following 
page shows commands that can be used to do this manually via beeline.  
+  
+Add in the 
_packaging/hoodie-hive-bundle/target/hoodie-hive-bundle-0.4.6-SNAPSHOT.jar,_ so 
that Hive can read the Hudi dataset and answer the query.
+
+```java
+hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
+hive> set hive.stats.autogather=false;
+hive> add jar file:///path/to/hoodie-hive-bundle-0.5.2-SNAPSHOT.jar;
+Added [file:///path/to/hoodie-hive-bundle-0.5.2-SNAPSHOT.jar] to class path
+Added resources: [file:///path/to/hoodie-hive-bundle-0.5.2-SNAPSHOT.jar]
+```
+
+
+Then, you need to create a *ReadOptimized* Hive table as below and register 
the sample partitions
+
+```java
+DROP TABLE hoodie_test;
+CREATE EXTERNAL TABLE hoodie_test(`_row_key` string,
+    `_hoodie_commit_time` string,
+    `_hoodie_commit_seqno` string,
+    rider string,
+    driver string,
+    begin_lat double,
+    begin_lon double,
+    end_lat double,
+    end_lon double,
+    fare double)
+    PARTITIONED BY (`datestr` string)
+    ROW FORMAT SERDE
+    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+    STORED AS INPUTFORMAT
+    'com.uber.hoodie.hadoop.HoodieInputFormat'
+    OUTPUTFORMAT
+    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+    LOCATION
+    'hdfs:///tmp/hoodie/sample-table';
+     
+ALTER TABLE `hoodie_test` ADD IF NOT EXISTS PARTITION (datestr='2016-03-15') 
LOCATION 'hdfs:///tmp/hoodie/sample-table/2016/03/15';
+ALTER TABLE `hoodie_test` ADD IF NOT EXISTS PARTITION (datestr='2015-03-16') 
LOCATION 'hdfs:///tmp/hoodie/sample-table/2015/03/16';
+ALTER TABLE `hoodie_test` ADD IF NOT EXISTS PARTITION (datestr='2015-03-17') 
LOCATION 'hdfs:///tmp/hoodie/sample-table/2015/03/17';
+     
+set mapreduce.framework.name=yarn;
+```
+
+And you can add a *Realtime* Hive table, as below
+
+```java
+DROP TABLE hoodie_rt;
+CREATE EXTERNAL TABLE hoodie_rt(
+    `_hoodie_commit_time` string,
+    `_hoodie_commit_seqno` string,
+    `_hoodie_record_key` string,
+    `_hoodie_partition_path` string,
+    `_hoodie_file_name` string,
+    timestamp double,
+    `_row_key` string,
+    rider string,
+    driver string,
+    begin_lat double,
+    begin_lon double,
+    end_lat double,
+    end_lon double,
+    fare double)
+    PARTITIONED BY (`datestr` string)
+    ROW FORMAT SERDE
+    'com.uber.hoodie.hadoop.realtime.HoodieParquetSerde'
+    STORED AS INPUTFORMAT
+    'com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat'
+    OUTPUTFORMAT
+    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+    LOCATION
+    'file:///tmp/hoodie/sample-table';
+     
+ALTER TABLE `hoodie_rt` ADD IF NOT EXISTS PARTITION (datestr='2016-03-15') 
LOCATION 'file:///tmp/hoodie/sample-table/2016/03/15';
+ALTER TABLE `hoodie_rt` ADD IF NOT EXISTS PARTITION (datestr='2015-03-16') 
LOCATION 'file:///tmp/hoodie/sample-table/2015/03/16';
+ALTER TABLE `hoodie_rt` ADD IF NOT EXISTS PARTITION (datestr='2015-03-17') 
LOCATION 'file:///tmp/hoodie/sample-table/2015/03/17';
+```
+
diff --git a/docs/_posts/2019-09-09-ingesting-database-changes.md 
b/docs/_posts/2019-09-09-ingesting-database-changes.md
new file mode 100644
index 0000000..9c16881
--- /dev/null
+++ b/docs/_posts/2019-09-09-ingesting-database-changes.md
@@ -0,0 +1,45 @@
+---
+title: "Ingesting Database changes via Sqoop/Hudi"
+excerpt: "Learn how to ingesting changes from a HUDI dataset using Sqoop/Hudi"
+author: vinoth
+category: blog
+---
+
+Very simple in just 2 steps.
+
+**Step 1**: Extract new changes to users table in MySQL, as avro data files on 
DFS
+
+```bash
+// Command to extract incrementals using sqoop
+bin/sqoop import \
+  -Dmapreduce.job.user.classpath.first=true \
+  --connect jdbc:mysql://localhost/users \
+  --username root \
+  --password ******* \
+  --table users \
+  --as-avrodatafile \
+  --target-dir \ 
+  s3:///tmp/sqoop/import-1/users
+```
+
+**Step 2**: Use your fav datasource to read extracted data and directly 
“upsert” the users table on DFS/Hive
+
+```scala
+// Spark Datasource
+import org.apache.hudi.DataSourceWriteOptions._
+// Use Spark datasource to read avro
+val inputDataset = spark.read.avro("s3://tmp/sqoop/import-1/users/*");
+     
+// save it as a Hudi dataset
+inputDataset.write.format("org.apache.hudi”)
+  .option(HoodieWriteConfig.TABLE_NAME, "hoodie.users")
+  .option(RECORDKEY_FIELD_OPT_KEY(), "userID")
+  .option(PARTITIONPATH_FIELD_OPT_KEY(),"country")
+  .option(PRECOMBINE_FIELD_OPT_KEY(), "last_mod")
+  .option(OPERATION_OPT_KEY(), UPSERT_OPERATION_OPT_VAL())
+  .mode(SaveMode.Append)
+  .save("/path/on/dfs");
+```
+
+Alternatively, you can also use the Hudi 
[DeltaStreamer](https://hudi.apache.org/writing_data.html#deltastreamer) tool 
with the DFSSource.
+
diff --git a/docs/_posts/2020-01-15-delete-support-in-hudi.md 
b/docs/_posts/2020-01-15-delete-support-in-hudi.md
new file mode 100644
index 0000000..897ad54
--- /dev/null
+++ b/docs/_posts/2020-01-15-delete-support-in-hudi.md
@@ -0,0 +1,189 @@
+---
+title: "Delete support in Hudi"
+excerpt: "Deletes are supported at a record level in Hudi with 0.5.1 release. 
This blog is a “how to” blog on how to delete records in hudi."
+author: shivnarayan
+category: blog
+---
+
+Deletes are supported at a record level in Hudi with 0.5.1 release. This blog 
is a "how to" blog on how to delete records in hudi. Deletes can be done with 3 
flavors: Hudi RDD APIs, with Spark data source and with DeltaStreamer.
+
+### Delete using RDD Level APIs
+
+If you have embedded  _HoodieWriteClient_ , then deletion is as simple as 
passing in a  _JavaRDD<HoodieKey>_ to the delete api.
+
+```java
+// Fetch list of HoodieKeys from elsewhere that needs to be deleted
+// convert to JavaRDD if required. JavaRDD<HoodieKey> toBeDeletedKeys
+List<WriteStatus> statuses = writeClient.delete(toBeDeletedKeys, commitTime);
+```
+
+### Deletion with Datasource
+
+Now we will walk through an example of how to perform deletes on a sample 
dataset using the Datasource API. Quick Start has the same example as below. 
Feel free to check it out.
+
+**Step 1** : Launch spark shell
+
+```bash
+bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.1-incubating \
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+```
+**Step 2** : Import as required and set up table name, etc for sample dataset
+
+```scala
+import org.apache.hudi.QuickstartUtils._
+import scala.collection.JavaConversions._
+import org.apache.spark.sql.SaveMode._
+import org.apache.hudi.DataSourceReadOptions._
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.config.HoodieWriteConfig._
+     
+val tableName = "hudi_cow_table"
+val basePath = "file:///tmp/hudi_cow_table"
+val dataGen = new DataGenerator
+```
+
+**Step 3** : Insert data. Generate some new trips, load them into a DataFrame 
and write the DataFrame into the Hudi dataset as below.
+
+```scala
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("org.apache.hudi").
+  options(getQuickstartWriteConfigs).
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Overwrite).
+  save(basePath);
+```
+
+**Step 4** : Query data. Load the data files into a DataFrame.
+
+```scala
+val roViewDF = spark.read.
+  format("org.apache.hudi").
+  load(basePath + "/*/*/*/*")
+roViewDF.createOrReplaceTempView("hudi_ro_table")
+spark.sql("select count(*) from hudi_ro_table").show() // should return 10 
(number of records inserted above)
+val riderValue = spark.sql("select distinct rider from hudi_ro_table").show()
+// copy the value displayed to be used in next step
+```
+
+**Step 5** : Fetch records that needs to be deleted, with the above rider 
value. This example is just to illustrate how to delete. In real world, use a 
select query using spark sql to fetch records that needs to be deleted and from 
the result we could invoke deletes as given below. Example rider value used is 
"rider-213".
+
+```scala
+val df = spark.sql("select uuid, partitionPath from hudi_ro_table where rider 
= 'rider-213'")
+```
+
+// Replace the above query with any other query that will fetch records to be 
deleted.
+
+**Step 6** : Issue deletes
+
+```scala
+val deletes = dataGen.generateDeletes(df.collectAsList())
+val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
+df.write.format("org.apache.hudi").
+  options(getQuickstartWriteConfigs).
+  option(OPERATION_OPT_KEY,"delete").
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath);
+```
+
+**Step 7** : Reload the table and verify that the records are deleted
+
+```scala
+val roViewDFAfterDelete = spark.
+  read.
+  format("org.apache.hudi").
+  load(basePath + "/*/*/*/*")
+roViewDFAfterDelete.createOrReplaceTempView("hudi_ro_table")
+spark.sql("select uuid, partitionPath from hudi_ro_table where rider = 
'rider-213'").show() // should not return any rows
+```
+
+## Deletion with HoodieDeltaStreamer
+
+Deletion with `HoodieDeltaStreamer` takes the same path as upsert and so it 
relies on a specific field called  "*_hoodie_is_deleted*" of type boolean in 
each record.
+
+-   If a record has the field value set to  _false_ or it's not present, then 
it is considered a regular upsert
+-   if not (if the value is set to  _true_ ), then its considered to be 
deleted record.
+
+This essentially means that the schema has to be changed for the source, to 
add this field and all incoming records are expected to have this field set. We 
will be working to relax this in future releases.
+
+Lets say the original schema is:
+
+```json
+{
+  "type":"record",
+  "name":"example_tbl",
+  "fields":[{
+     "name": "uuid",
+     "type": "String"
+  }, {
+     "name": "ts",
+     "type": "string"
+  },  {
+     "name": "partitionPath",
+     "type": "string"
+  }, {
+     "name": "rank",
+     "type": "long"
+  }
+]}
+```
+
+To leverage deletion capabilities of `DeltaStreamer`, you have to change the 
schema as below.
+
+```json
+{
+  "type":"record",
+  "name":"example_tbl",
+  "fields":[{
+     "name": "uuid",
+     "type": "String"
+  }, {
+     "name": "ts",
+     "type": "string"
+  },  {
+     "name": "partitionPath",
+     "type": "string"
+  }, {
+     "name": "rank",
+     "type": "long"
+  }, {
+    "name" : "_hoodie_is_deleted",
+    "type" : "boolean",
+    "default" : false
+  }
+]}
+```
+
+Example incoming record for upsert
+
+```json
+{
+  "ts": 0.0,
+  "uuid":"69cdb048-c93e-4532-adf9-f61ce6afe605",
+  "rank": 1034,
+  "partitionpath":"americas/brazil/sao_paulo",
+  "_hoodie_is_deleted":false
+}
+```
+      
+
+Example incoming record that needs to be deleted
+```json
+{
+  "ts": 0.0,
+  "uuid": "19tdb048-c93e-4532-adf9-f61ce6afe10",
+  "rank": 1045,
+  "partitionpath":"americas/brazil/sao_paulo",
+  "_hoodie_is_deleted":true
+}
+```
+
+These are one time changes. Once these are in, then the DeltaStreamer pipeline 
will handle both upserts and deletions within every batch. Each batch could 
contain a mix of upserts and deletes and no additional step or changes are 
required after this. Note that this is to perform hard deletion instead of soft 
deletion.
+
diff --git a/docs/_posts/2020-01-20-change-capture-using-aws.md 
b/docs/_posts/2020-01-20-change-capture-using-aws.md
new file mode 100644
index 0000000..f95c30c
--- /dev/null
+++ b/docs/_posts/2020-01-20-change-capture-using-aws.md
@@ -0,0 +1,202 @@
+---
+title: "Change Capture Using AWS Database Migration Service and Hudi"
+excerpt: "In this blog, we will build an end-end solution for capturing 
changes from a MySQL instance running on AWS RDS to a Hudi table on S3, using 
capabilities in the Hudi 0.5.1 release."
+author: vinoth
+category: blog
+---
+
+One of the core use-cases for Apache Hudi is enabling seamless, efficient 
database ingestion to your data lake. Even though a lot has been talked about 
and even users already adopting this model, content on how to go about this is 
sparse.
+
+In this blog, we will build an end-end solution for capturing changes from a 
MySQL instance running on AWS RDS to a Hudi table on S3, using capabilities in 
the Hudi  **0.5.1 release**
+
+  
+
+We can break up the problem into two pieces.
+
+1.  **Extracting change logs from MySQL**  : Surprisingly, this is still a 
pretty tricky problem to solve and often Hudi users get stuck here. Thankfully, 
at-least for AWS users, there is a  [Database Migration 
service](https://aws.amazon.com/dms/)  (DMS for short), that does this change 
capture and uploads them as parquet files on S3
+2.  **Applying these change logs to your data lake table**  : Once there are 
change logs in some form, the next step is to apply them incrementally to your 
table. This mundane task can be fully automated using the Hudi  
[DeltaStreamer](http://hudi.apache.org/docs/writing_data.html#deltastreamer)  
tool.
+
+  
+
+The actual end-end architecture looks something like this.
+![enter image description 
here](/assets/images/blog/change-capture-architecture.png)
+
+Let's now illustrate how one can accomplish this using a simple _orders_ 
table, stored in MySQL (these instructions should broadly apply to other 
database engines like Postgres, or Aurora as well, though SQL/Syntax may change)
+
+```java
+CREATE DATABASE hudi_dms;
+USE hudi_dms;
+     
+CREATE TABLE orders(
+   order_id INTEGER,
+   order_qty INTEGER,
+   customer_name VARCHAR(100),
+   updated_at TIMESTAMP DEFAULT NOW() ON UPDATE NOW(),
+   created_at TIMESTAMP DEFAULT NOW(),
+   CONSTRAINT orders_pk PRIMARY KEY(order_id)
+);
+ 
+INSERT INTO orders(order_id, order_qty, customer_name) VALUES(1, 10, 'victor');
+INSERT INTO orders(order_id, order_qty, customer_name) VALUES(2, 20, 'peter');
+```
+
+In the table, _order_id_ is the primary key which will be enforced on the Hudi 
table as well. Since a batch of change records can contain changes to the same 
primary key, we also include _updated_at_ and _created_at_ fields, which are 
kept upto date as writes happen to the table.
+
+### Extracting Change logs from MySQL
+
+Before we can configure DMS, we first need to [prepare the MySQL 
instance](https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/)
  for change capture, by ensuring backups are enabled and binlog is turned on.
+![](/assets/images/blog/change-logs-mysql.png)
+
+Now, proceed to create endpoints in DMS that capture MySQL data and  [store in 
S3, as parquet 
files](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html).
+
+-   Source _hudi-source-db_ endpoint, points to the DB server and provides 
basic authentication details
+-   Target _parquet-s3_ endpoint, points to the bucket and folder on s3 to 
store the change logs records as parquet files
+![](/assets/images/blog/s3-endpoint-configuration-1.png)
+![](/assets/images/blog/s3-endpoint-configuration-2.png)
+![](/assets/images/blog/s3-endpoint-list.png)
+
+Then proceed to create a migration task, as below. Give it a name, connect the 
source to the target and be sure to pick the right _Migration type_ as shown 
below, to ensure ongoing changes are continuously replicated to S3. Also make 
sure to specify, the rules using which DMS decides which MySQL schema/tables to 
replicate. In this example, we simply whitelist _orders_ table under the 
_hudi_dms_ schema, as specified in the table SQL above.
+
+![](/assets/images/blog/s3-migration-task-1.png)
+![](/assets/images/blog/s3-migration-task-2.png)
+
+Starting the DMS task and should result in an initial load, like below.
+
+![](/assets/images/blog/dms-task.png)
+
+Simply reading the raw initial load file, shoud give the same values as the 
upstream table
+
+```scala
+scala> 
spark.read.parquet("s3://hudi-dms-demo/orders/hudi_dms/orders/*").sort("updated_at").show
+ 
++--------+---------+-------------+-------------------+-------------------+
+|order_id|order_qty|customer_name|         updated_at|         created_at|
++--------+---------+-------------+-------------------+-------------------+
+|       2|       10|        peter|2020-01-20 20:12:22|2020-01-20 20:12:22|
+|       1|       10|       victor|2020-01-20 20:12:31|2020-01-20 20:12:31|
++--------+---------+-------------+-------------------+-------------------+
+
+```
+
+## Applying Change Logs using Hudi DeltaStreamer
+
+Now, we are ready to start consuming the change logs. Hudi DeltaStreamer runs 
as Spark job on your favorite workflow scheduler (it also supports a continuous 
mode using _--continuous_ flag, where it runs as a long running Spark job), 
that tails a given path on S3 (or any DFS implementation) for new files and can 
issue an _upsert_ to a target hudi dataset. The tool automatically checkpoints 
itself and thus to repeatedly ingest, all one needs to do is to keep executing 
the DeltaStreamer pe [...]
+
+With an initial load already on S3, we then run the following command 
(deltastreamer command, here on) to ingest the full load first and create a 
Hudi dataset on S3.
+
+```bash
+spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
+  --packages org.apache.spark:spark-avro_2.11:2.4.4 \
+  --master yarn --deploy-mode client \
+  hudi-utilities-bundle_2.11-0.5.1-SNAPSHOT.jar \
+  --table-type COPY_ON_WRITE \
+  --source-ordering-field updated_at \
+  --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
+  --target-base-path s3://hudi-dms-demo/hudi_orders --target-table hudi_orders 
\
+  --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
+  --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
+  --hoodie-conf 
hoodie.datasource.write.recordkey.field=order_id,hoodie.datasource.write.partitionpath.field=customer_name,hoodie.deltastreamer.source.dfs.root=s3://hudi-dms-demo/orders/hudi_dms/orders
+```
+
+A few things are going on here
+
+-   First, we specify the _--table-type_ as COPY_ON_WRITE. Hudi also supports 
another _MERGE_ON_READ ty_pe you can use if you choose from.
+-   To handle cases where the input parquet files contain multiple 
updates/deletes or insert/updates to the same record, we use _updated_at_ as 
the ordering field. This ensures that the change record which has the latest 
timestamp will be reflected in Hudi.
+-   We specify a target base path and a table table, all needed for creating 
and writing to the Hudi table
+-   We use a special payload class - _AWSDMSAvroPayload_ , to handle the 
different change operations correctly. The parquet files generated have an _Op_ 
field, that indicates whether a given change record is an insert (I), delete 
(D) or update (U) and the payload implementation uses this field to decide how 
to handle a given change record.
+-   You may also notice a special transformer class _AWSDmsTransformer_ , 
being specified. The reason here is tactical, but important. The initial load 
file does not contain an _Op_ field, so this adds one to Hudi table schema 
additionally.
+-   Finally, we specify the record key for the Hudi table as same as the 
upstream table. Then we specify partitioning by _customer_name_  and also the 
root of the DMS output.
+
+Once the command is run, the Hudi table should be created and have same 
records as the upstream table (with all the _hoodie fields as well).
+
+```scala
+scala> 
spark.read.format("org.apache.hudi").load("s3://hudi-dms-demo/hudi_orders/*/*.parquet").show
++-------------------+--------------------+------------------+----------------------+--------------------+--------+---------+-------------+-------------------+-------------------+---+
+|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|order_id|order_qty|customer_name|         updated_at|      
   created_at| Op|
++-------------------+--------------------+------------------+----------------------+--------------------+--------+---------+-------------+-------------------+-------------------+---+
+|     20200120205028|  20200120205028_0_1|                 2|                 
peter|af9a2525-a486-40e...|       2|       10|        peter|2020-01-20 
20:12:22|2020-01-20 20:12:22|   |
+|     20200120205028|  20200120205028_1_1|                 1|                
victor|8e431ece-d51c-4c7...|       1|       10|       victor|2020-01-20 
20:12:31|2020-01-20 20:12:31|   |
++-------------------+--------------------+------------------+----------------------+--------------------+--------+---------+-------------+-------------------+-------------------+---+
+```
+
+Now, let's do an insert and an update
+
+```java
+INSERT INTO orders(order_id, order_qty, customer_name) VALUES(3, 30, 'sandy');
+UPDATE orders set order_qty = 20 where order_id = 2;
+```
+
+This will add a new parquet file to the DMS output folder and when the 
deltastreamer command is run again, it will go ahead and apply these to the 
Hudi table.
+
+So, querying the Hudi table now would yield 3 rows and the 
_hoodie_commit_time_ accurately reflects when these writes happened. You can 
notice that order_qty for order_id=2, is updated from 10 to 20!
+
+```bash
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name| Op|order_id|order_qty|customer_name|         updated_at|  
       created_at|
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+|     20200120211526|  20200120211526_0_1|                 2|                 
peter|af9a2525-a486-40e...|  U|       2|       20|        peter|2020-01-20 
21:11:47|2020-01-20 20:12:22|
+|     20200120211526|  20200120211526_1_1|                 3|                 
sandy|566eb34a-e2c5-44b...|  I|       3|       30|        sandy|2020-01-20 
21:11:24|2020-01-20 21:11:24|
+|     20200120205028|  20200120205028_1_1|                 1|                
victor|8e431ece-d51c-4c7...|   |       1|       10|       victor|2020-01-20 
20:12:31|2020-01-20 20:12:31|
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+```
+
+A nice debugging aid would be read all of the DMS output now and sort it by 
update_at, which should give us a sequence of changes that happened on the 
upstream table. As we can see, the Hudi table above is a compacted snapshot of 
this raw change log.
+
+```bash
++----+--------+---------+-------------+-------------------+-------------------+
+|  Op|order_id|order_qty|customer_name|         updated_at|         created_at|
++----+--------+---------+-------------+-------------------+-------------------+
+|null|       2|       10|        peter|2020-01-20 20:12:22|2020-01-20 20:12:22|
+|null|       1|       10|       victor|2020-01-20 20:12:31|2020-01-20 20:12:31|
+|   I|       3|       30|        sandy|2020-01-20 21:11:24|2020-01-20 21:11:24|
+|   U|       2|       20|        peter|2020-01-20 21:11:47|2020-01-20 20:12:22|
++----+--------+---------+-------------+-------------------+-------------------+
+```
+
+Initial load with no _Op_ field value , followed by an insert and an update.
+
+Now, lets do deletes an inserts
+
+```java
+DELETE FROM orders WHERE order_id = 2;
+INSERT INTO orders(order_id, order_qty, customer_name) VALUES(4, 40, 'barry');
+INSERT INTO orders(order_id, order_qty, customer_name) VALUES(5, 50, 'nathan');
+```
+
+This should result in more files on S3, written by DMS , which the 
DeltaStreamer command will continue to process incrementally (i.e only the 
newly written files are read each time)
+
+![](/assets/images/blog/dms-demo-files.png)
+
+Running the deltastreamer command again, would result in the follow state for 
the Hudi table. You can notice the two new records and that the _order_id=2_ is 
now gone
+
+```bash
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name| Op|order_id|order_qty|customer_name|         updated_at|  
       created_at|
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+|     20200120212522|  20200120212522_1_1|                 5|                
nathan|3da94b20-c70b-457...|  I|       5|       50|       nathan|2020-01-20 
21:23:00|2020-01-20 21:23:00|
+|     20200120212522|  20200120212522_2_1|                 4|                 
barry|8cc46715-8f0f-48a...|  I|       4|       40|        barry|2020-01-20 
21:22:49|2020-01-20 21:22:49|
+|     20200120211526|  20200120211526_1_1|                 3|                 
sandy|566eb34a-e2c5-44b...|  I|       3|       30|        sandy|2020-01-20 
21:11:24|2020-01-20 21:11:24|
+|     20200120205028|  20200120205028_1_1|                 1|                
victor|8e431ece-d51c-4c7...|   |       1|       10|       victor|2020-01-20 
20:12:31|2020-01-20 20:12:31|
++-------------------+--------------------+------------------+----------------------+--------------------+---+--------+---------+-------------+-------------------+-------------------+
+```
+
+Our little informal change log query yields the following.
+
+```bash
++----+--------+---------+-------------+-------------------+-------------------+
+|  Op|order_id|order_qty|customer_name|         updated_at|         created_at|
++----+--------+---------+-------------+-------------------+-------------------+
+|null|       2|       10|        peter|2020-01-20 20:12:22|2020-01-20 20:12:22|
+|null|       1|       10|       victor|2020-01-20 20:12:31|2020-01-20 20:12:31|
+|   I|       3|       30|        sandy|2020-01-20 21:11:24|2020-01-20 21:11:24|
+|   U|       2|       20|        peter|2020-01-20 21:11:47|2020-01-20 20:12:22|
+|   D|       2|       20|        peter|2020-01-20 21:11:47|2020-01-20 20:12:22|
+|   I|       4|       40|        barry|2020-01-20 21:22:49|2020-01-20 21:22:49|
+|   I|       5|       50|       nathan|2020-01-20 21:23:00|2020-01-20 21:23:00|
++----+--------+---------+-------------+-------------------+-------------------+
+```
+
+Note that the delete and update have the same _updated_at,_ value. thus it can 
very well order differently here.. In short this way of looking at the 
changelog has its caveats. For a true changelog of the Hudi table itself, you 
can issue an [incremental 
query](http://hudi.apache.org/docs/querying_data.html).
+
+And Life goes on ..... Hope this was useful to all the data engineers out 
there!
+
diff --git a/docs/_posts/2020-03-22-exporting-hudi-datasets.md 
b/docs/_posts/2020-03-22-exporting-hudi-datasets.md
new file mode 100644
index 0000000..3b4117d
--- /dev/null
+++ b/docs/_posts/2020-03-22-exporting-hudi-datasets.md
@@ -0,0 +1,102 @@
+---
+title: "Export Hudi datasets as a copy or as different formats"
+excerpt: "Learn how to copy or export HUDI dataset in various formats."
+author: rxu
+category: blog
+---
+
+### Copy to Hudi dataset
+
+Similar to the existing  `HoodieSnapshotCopier`, the Exporter scans the source 
dataset and then makes a copy of it to the target output path.
+
+```bash
+spark-submit \
+  --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.6.0-SNAPSHOT.jar" \
+  --deploy-mode "client" \
+  --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.0-SNAPSHOT.jar
 \
+  --source-base-path "/tmp/" \
+  --target-output-path "/tmp/exported/hudi/" \
+  --output-format "hudi"
+```
+
+### Export to json or parquet dataset
+The Exporter can also convert the source dataset into other formats. Currently 
only "json" and "parquet" are supported.
+
+```bash
+spark-submit \
+  --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.6.0-SNAPSHOT.jar" \
+  --deploy-mode "client" \
+  --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.0-SNAPSHOT.jar
 \
+  --source-base-path "/tmp/" \
+  --target-output-path "/tmp/exported/json/" \
+  --output-format "json"  # or "parquet"
+```
+
+### Re-partitioning
+
+When export to a different format, the Exporter takes parameters to do some 
custom re-partitioning. By default, if neither of the 2 parameters below is 
given, the output dataset will have no partition.
+
+#### `--output-partition-field`
+
+This parameter uses an existing non-metadata field as the output partitions. 
All  `_hoodie_*`  metadata field will be stripped during export.
+
+```bash
+spark-submit \
+  --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.6.0-SNAPSHOT.jar" \
+  --deploy-mode "client" \
+  --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.0-SNAPSHOT.jar
 \  
+  --source-base-path "/tmp/" \
+  --target-output-path "/tmp/exported/json/" \
+  --output-format "json" \
+  --output-partition-field "symbol"  # assume the source dataset contains a 
field `symbol`
+```
+
+The output directory will look like this
+
+```bash
+`_SUCCESS symbol=AMRS symbol=AYX symbol=CDMO symbol=CRC symbol=DRNA ...`
+```
+
+#### `--output-partitioner`
+
+This parameter takes in a fully-qualified name of a class that implements  
`HoodieSnapshotExporter.Partitioner`. This parameter takes higher precedence 
than  `--output-partition-field`, which will be ignored if this is provided.
+
+An example implementation is shown below:
+
+**MyPartitioner.java**
+
+```java
+package com.foo.bar;
+public class MyPartitioner implements HoodieSnapshotExporter.Partitioner {
+
+  private static final String PARTITION_NAME = "date";
+ 
+  @Override
+  public DataFrameWriter<Row> partition(Dataset<Row> source) {
+    // use the current hoodie partition path as the output partition
+    return source
+        .withColumnRenamed(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
PARTITION_NAME)
+        .repartition(new Column(PARTITION_NAME))
+        .write()
+        .partitionBy(PARTITION_NAME);
+  }
+}
+```
+
+After putting this class in `my-custom.jar`, which is then placed on the job 
classpath, the submit command will look like this:
+
+```bash
+spark-submit \
+  --jars 
"packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-0.6.0-SNAPSHOT.jar,my-custom.jar"
 \
+  --deploy-mode "client" \
+  --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
+      
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.6.0-SNAPSHOT.jar
 \
+  --source-base-path "/tmp/" \
+  --target-output-path "/tmp/exported/json/" \
+  --output-format "json" \
+  --output-partitioner "com.foo.bar.MyPartitioner"
+```
+
diff --git a/docs/_posts/2020-04-27-apache-hudi-apache-zepplin.md 
b/docs/_posts/2020-04-27-apache-hudi-apache-zepplin.md
new file mode 100644
index 0000000..32eabc9
--- /dev/null
+++ b/docs/_posts/2020-04-27-apache-hudi-apache-zepplin.md
@@ -0,0 +1,65 @@
+---
+title: "Apache Hudi (Incubating) Support on Apache Zeppelin"
+excerpt: "Integrating HUDI's real-time and read-optimized query capabilities 
into Apache Zeppelin’s notebook"
+author: leesf
+category: blog
+---
+
+
+## 1. Introduction
+Apache Zeppelin is a web-based notebook that provides interactive data 
analysis. It is convenient for you to make beautiful documents that can be 
data-driven, interactive, and collaborative, and supports multiple languages, 
including Scala (using Apache Spark), Python (Apache Spark), SparkSQL, Hive, 
Markdown, Shell, and so on. Hive and SparkSQL currently support querying Hudi’s 
read-optimized view and real-time view. So in theory, Zeppelin’s notebook 
should also have such query capabilities.
+
+## 2. Achieve the effect
+### 2.1 Hive
+
+### 2.1.1 Read optimized view
+![Read Optimized View](/assets/images/blog/read_optimized_view.png)
+
+### 2.1.2 Real-time view
+![Real-time View](/assets/images/blog/real_time_view.png)
+
+### 2.2 Spark SQL
+
+### 2.2.1 Read optimized view
+![Read Optimized View](/assets/images/blog/spark_read_optimized_view.png)
+
+### 2.2.2 Real-time view
+![Real-time View](/assets/images/blog/spark_real_time_view.png)
+
+## 3. Common problems
+
+### 3.1 Hudi package adaptation
+Zeppelin will load the packages under lib by default when starting. For 
external dependencies such as Hudi, it is suitable to be placed directly under 
zeppelin / lib to avoid Hive or Spark SQL not finding the corresponding Hudi 
dependency on the cluster.
+
+### 3.2 Parquet jar package adaptation
+The parquet version of the Hudi package is 1.10, and the current parquet 
version of the CDH cluster is 1.9, so when executing the Hudi table query, many 
jar package conflict errors will be reported.
+
+**Solution**: upgrade the parquet package to 1.10 in the spark / jars 
directory of the node where zepeelin is located.
+**Side effects**: The tasks of saprk jobs other than zeppelin assigned to the 
cluster nodes of parquet 1.10 may fail.
+**Suggestions**: Clients other than zeppelin will also have jar conflicts. 
Therefore, it is recommended to fully upgrade the spark jar, parquet jar and 
related dependent jars of the cluster to better adapt to Hudi’s capabilities.
+
+### 3.3 Spark Interpreter adaptation
+
+The same SQL using Spark SQL query on Zeppelin will have more records than the 
hive query.
+
+**Cause of the problem**: When reading and writing Parquet tables to the Hive 
metastore, Spark SQL will use the Parquet SerDe (SerDe: Serialize / Deserilize 
for short) for Spark serialization and deserialization, not the Hive’s SerDe, 
because Spark SQL’s own SerDe has better performance.
+
+This causes Spark SQL to only query Hudi’s pipeline records, not the final 
merge result.
+
+**Solution**: set `spark.sql.hive.convertMetastoreParquet=false`
+
+ 1. **Method 1**: Edit properties directly on the page**
+![](/assets/images/blog/spark_edit_properties.png)
+ 2. **Method 2**: Edit `zeppelin / conf / interpreter.json` and add**
+
+```json
+"spark.sql.hive.convertMetastoreParquet": {
+  "name": "spark.sql.hive.convertMetastoreParquet",
+  "value": false,
+  "type": "checkbox"
+}
+```
+## 4. Hudi incremental view
+
+For Hudi incremental view, currently only supports pulling by writing Spark 
code. Considering that Zeppelin has the ability to execute code and shell 
commands directly on the notebook, later consider packaging these notebooks to 
query Hudi incremental views in a way that supports SQL.
+
diff --git a/docs/_sass/hudi_style/_archive.scss 
b/docs/_sass/hudi_style/_archive.scss
index 49c24c8..155956a 100644
--- a/docs/_sass/hudi_style/_archive.scss
+++ b/docs/_sass/hudi_style/_archive.scss
@@ -64,6 +64,12 @@
   }
 }
 
+.archive__item-meta {
+  font-size: 0.8em;
+  margin-top:0;
+  margin-bottom: 1em;
+}
+
 /* remove border*/
 .page__content {
   .archive__item-title {
@@ -444,4 +450,4 @@
       padding-right: 0;
     }
   }
-}
\ No newline at end of file
+}
diff --git a/docs/_sass/hudi_style/_page.scss b/docs/_sass/hudi_style/_page.scss
index 43edff3..a18d73c 100644
--- a/docs/_sass/hudi_style/_page.scss
+++ b/docs/_sass/hudi_style/_page.scss
@@ -78,6 +78,12 @@ body {
   }
 }
 
+.page__author {
+  font-size: 0.8em;
+  margin-top:0;
+  margin-bottom: 1em;
+}
+
 .page__lead {
   font-family: $global-font-family;
   font-size: $type-size-4;
diff --git a/docs/assets/images/blog/batch_vs_incremental.png 
b/docs/assets/images/blog/batch_vs_incremental.png
new file mode 100644
index 0000000..355adf8
Binary files /dev/null and b/docs/assets/images/blog/batch_vs_incremental.png 
differ
diff --git a/docs/assets/images/blog/change-capture-architecture.png 
b/docs/assets/images/blog/change-capture-architecture.png
new file mode 100644
index 0000000..e54a423
Binary files /dev/null and 
b/docs/assets/images/blog/change-capture-architecture.png differ
diff --git a/docs/assets/images/blog/change-logs-mysql.png 
b/docs/assets/images/blog/change-logs-mysql.png
new file mode 100644
index 0000000..142351c
Binary files /dev/null and b/docs/assets/images/blog/change-logs-mysql.png 
differ
diff --git a/docs/assets/images/blog/dms-demo-files.png 
b/docs/assets/images/blog/dms-demo-files.png
new file mode 100644
index 0000000..8453aaa
Binary files /dev/null and b/docs/assets/images/blog/dms-demo-files.png differ
diff --git a/docs/assets/images/blog/dms-task.png 
b/docs/assets/images/blog/dms-task.png
new file mode 100644
index 0000000..9b6f5df
Binary files /dev/null and b/docs/assets/images/blog/dms-task.png differ
diff --git a/docs/assets/images/blog/read_optimized_view.png 
b/docs/assets/images/blog/read_optimized_view.png
new file mode 100644
index 0000000..8923bbf
Binary files /dev/null and b/docs/assets/images/blog/read_optimized_view.png 
differ
diff --git a/docs/assets/images/blog/real_time_view.png 
b/docs/assets/images/blog/real_time_view.png
new file mode 100644
index 0000000..90a3e12
Binary files /dev/null and b/docs/assets/images/blog/real_time_view.png differ
diff --git a/docs/assets/images/blog/s3-endoint-list.png 
b/docs/assets/images/blog/s3-endoint-list.png
new file mode 100644
index 0000000..3f68514
Binary files /dev/null and b/docs/assets/images/blog/s3-endoint-list.png differ
diff --git a/docs/assets/images/blog/s3-endpoint-configuration-1.png 
b/docs/assets/images/blog/s3-endpoint-configuration-1.png
new file mode 100644
index 0000000..1d5997e
Binary files /dev/null and 
b/docs/assets/images/blog/s3-endpoint-configuration-1.png differ
diff --git a/docs/assets/images/blog/s3-endpoint-configuration-2.png 
b/docs/assets/images/blog/s3-endpoint-configuration-2.png
new file mode 100644
index 0000000..5c2acf1
Binary files /dev/null and 
b/docs/assets/images/blog/s3-endpoint-configuration-2.png differ
diff --git a/docs/assets/images/blog/s3-endpoint-configuration.png 
b/docs/assets/images/blog/s3-endpoint-configuration.png
new file mode 100644
index 0000000..2b583d1
Binary files /dev/null and 
b/docs/assets/images/blog/s3-endpoint-configuration.png differ
diff --git a/docs/assets/images/blog/s3-migration-task-1.png 
b/docs/assets/images/blog/s3-migration-task-1.png
new file mode 100644
index 0000000..a14f2da
Binary files /dev/null and b/docs/assets/images/blog/s3-migration-task-1.png 
differ
diff --git a/docs/assets/images/blog/s3-migration-task-2.png 
b/docs/assets/images/blog/s3-migration-task-2.png
new file mode 100644
index 0000000..66f4d4d
Binary files /dev/null and b/docs/assets/images/blog/s3-migration-task-2.png 
differ
diff --git a/docs/assets/images/blog/spark_edit_properties.png 
b/docs/assets/images/blog/spark_edit_properties.png
new file mode 100644
index 0000000..9c732e1
Binary files /dev/null and b/docs/assets/images/blog/spark_edit_properties.png 
differ
diff --git a/docs/assets/images/blog/spark_read_optimized_view.png 
b/docs/assets/images/blog/spark_read_optimized_view.png
new file mode 100644
index 0000000..2b583d1
Binary files /dev/null and 
b/docs/assets/images/blog/spark_read_optimized_view.png differ
diff --git a/docs/assets/images/blog/spark_real_time_view.png 
b/docs/assets/images/blog/spark_real_time_view.png
new file mode 100644
index 0000000..68aced5
Binary files /dev/null and b/docs/assets/images/blog/spark_real_time_view.png 
differ

[incubator-hudi] branch asf-site updated: [HUDI-862] Migrate HUDI site blogs from Confluence to Jeykll. (#1594)

Reply via email to