This is an automated email from the ASF dual-hosted git repository.
leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new b4f9ddc [HUDI-804] Add Azure support to doc (#1668)
b4f9ddc is described below
commit b4f9ddc3dbe1404f2f58dc9aac2279f3fb86e5d1
Author: Gary Li <[email protected]>
AuthorDate: Wed May 27 07:56:06 2020 -0700
[HUDI-804] Add Azure support to doc (#1668)
Co-authored-by: lamber-ken <[email protected]>
---
docs/_docs/0_5_oss_filesystem.cn.md | 2 +-
docs/_docs/0_5_oss_filesystem.md | 64 +++++++++++++++++------------------
docs/_docs/0_6_azure_filesystem.cn.md | 52 ++++++++++++++++++++++++++++
docs/_docs/0_6_azure_filesystem.md | 51 ++++++++++++++++++++++++++++
docs/_docs/2_4_configurations.cn.md | 4 +++
docs/_docs/2_4_configurations.md | 4 +++
6 files changed, 144 insertions(+), 33 deletions(-)
diff --git a/docs/_docs/0_5_oss_filesystem.cn.md
b/docs/_docs/0_5_oss_filesystem.cn.md
index a598f34..baaa984 100644
--- a/docs/_docs/0_5_oss_filesystem.cn.md
+++ b/docs/_docs/0_5_oss_filesystem.cn.md
@@ -1,7 +1,7 @@
---
title: OSS Filesystem
keywords: hudi, hive, aliyun, oss, spark, presto
-permalink: /docs/oss_hoodie.html
+permalink: /cn/docs/oss_hoodie.html
summary: In this page, we go over how to configure Hudi with OSS filesystem.
last_modified_at: 2020-04-21T12:50:50-10:00
language: cn
diff --git a/docs/_docs/0_5_oss_filesystem.md b/docs/_docs/0_5_oss_filesystem.md
index afd114c..766cffa 100644
--- a/docs/_docs/0_5_oss_filesystem.md
+++ b/docs/_docs/0_5_oss_filesystem.md
@@ -19,33 +19,33 @@ There are two configurations required for Hudi-OSS
compatibility:
Add the required configs in your core-site.xml from where Hudi can fetch them.
Replace the `fs.defaultFS` with your OSS bucket name, replace `fs.oss.endpoint`
with your OSS endpoint, replace `fs.oss.accessKeyId` with your OSS key, replace
`fs.oss.accessKeySecret` with your OSS secret. Hudi should be able to
read/write from the bucket.
```xml
- <property>
- <name>fs.defaultFS</name>
- <value>oss://bucketname/</value>
- </property>
+<property>
+ <name>fs.defaultFS</name>
+ <value>oss://bucketname/</value>
+</property>
- <property>
- <name>fs.oss.endpoint</name>
- <value>oss-endpoint-address</value>
- <description>Aliyun OSS endpoint to connect to.</description>
- </property>
+<property>
+ <name>fs.oss.endpoint</name>
+ <value>oss-endpoint-address</value>
+ <description>Aliyun OSS endpoint to connect to.</description>
+</property>
- <property>
- <name>fs.oss.accessKeyId</name>
- <value>oss_key</value>
- <description>Aliyun access key ID</description>
- </property>
+<property>
+ <name>fs.oss.accessKeyId</name>
+ <value>oss_key</value>
+ <description>Aliyun access key ID</description>
+</property>
- <property>
- <name>fs.oss.accessKeySecret</name>
- <value>oss-secret</value>
- <description>Aliyun access key secret</description>
- </property>
+<property>
+ <name>fs.oss.accessKeySecret</name>
+ <value>oss-secret</value>
+ <description>Aliyun access key secret</description>
+</property>
- <property>
- <name>fs.oss.impl</name>
- <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
- </property>
+<property>
+ <name>fs.oss.impl</name>
+ <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
+</property>
```
### Aliyun OSS Libs
@@ -54,18 +54,18 @@ Aliyun hadoop libraries jars to add to our pom.xml. Since
hadoop-aliyun depends
```xml
<dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-aliyun</artifactId>
- <version>3.2.1</version>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-aliyun</artifactId>
+ <version>3.2.1</version>
</dependency>
<dependency>
- <groupId>com.aliyun.oss</groupId>
- <artifactId>aliyun-sdk-oss</artifactId>
- <version>3.8.1</version>
+ <groupId>com.aliyun.oss</groupId>
+ <artifactId>aliyun-sdk-oss</artifactId>
+ <version>3.8.1</version>
</dependency>
<dependency>
- <groupId>org.jdom</groupId>
- <artifactId>jdom</artifactId>
- <version>1.1</version>
+ <groupId>org.jdom</groupId>
+ <artifactId>jdom</artifactId>
+ <version>1.1</version>
</dependency>
```
diff --git a/docs/_docs/0_6_azure_filesystem.cn.md
b/docs/_docs/0_6_azure_filesystem.cn.md
new file mode 100644
index 0000000..af1e290
--- /dev/null
+++ b/docs/_docs/0_6_azure_filesystem.cn.md
@@ -0,0 +1,52 @@
+---
+title: Azure Filesystem
+keywords: hudi, hive, azure, spark, presto
+permalink: /cn/docs/azure_hoodie.html
+summary: In this page, we go over how to configure Hudi with Azure filesystem.
+last_modified_at: 2020-05-25T19:00:57-04:00
+language: cn
+---
+In this page, we explain how to use Hudi on Microsoft Azure.
+
+## Disclaimer
+
+This page is maintained by the Hudi community.
+If the information is inaccurate or you have additional information to add.
+Please feel free to create a JIRA ticket. Contribution is highly appreciated.
+
+## Supported Storage System
+
+There are two storage systems support Hudi .
+
+- Azure Blob Storage
+- Azure Data Lake Gen 2
+
+## Verified Combination of Spark and storage system
+
+#### HDInsight Spark2.4 on Azure Data Lake Storage Gen 2
+This combination works out of the box. No extra config needed.
+
+#### Databricks Spark2.4 on Azure Data Lake Storage Gen 2
+- Import Hudi jar to databricks workspace
+
+- Mount the file system to dbutils.
+ ```scala
+ dbutils.fs.mount(
+ source = "abfss://[email protected]",
+ mountPoint = "/mountpoint",
+ extraConfigs = configs)
+ ```
+- When writing Hudi dataset, use abfss URL
+ ```scala
+ inputDF.write
+ .format("org.apache.hudi")
+ .options(opts)
+ .mode(SaveMode.Append)
+
.save("abfss://<<storage-account>>.dfs.core.windows.net/hudi-tables/customer")
+ ```
+- When reading Hudi dataset, use the mounting point
+ ```scala
+ spark.read
+ .format("org.apache.hudi")
+ .load("/mountpoint/hudi-tables/customer")
+ ```
diff --git a/docs/_docs/0_6_azure_filesystem.md
b/docs/_docs/0_6_azure_filesystem.md
new file mode 100644
index 0000000..7421496
--- /dev/null
+++ b/docs/_docs/0_6_azure_filesystem.md
@@ -0,0 +1,51 @@
+---
+title: Azure Filesystem
+keywords: hudi, hive, azure, spark, presto
+permalink: /docs/azure_hoodie.html
+summary: In this page, we go over how to configure Hudi with Azure filesystem.
+last_modified_at: 2020-05-25T19:00:57-04:00
+---
+In this page, we explain how to use Hudi on Microsoft Azure.
+
+## Disclaimer
+
+This page is maintained by the Hudi community.
+If the information is inaccurate or you have additional information to add.
+Please feel free to create a JIRA ticket. Contribution is highly appreciated.
+
+## Supported Storage System
+
+There are two storage systems support Hudi .
+
+- Azure Blob Storage
+- Azure Data Lake Gen 2
+
+## Verified Combination of Spark and storage system
+
+#### HDInsight Spark2.4 on Azure Data Lake Storage Gen 2
+This combination works out of the box. No extra config needed.
+
+#### Databricks Spark2.4 on Azure Data Lake Storage Gen 2
+- Import Hudi jar to databricks workspace
+
+- Mount the file system to dbutils.
+ ```scala
+ dbutils.fs.mount(
+ source = "abfss://[email protected]",
+ mountPoint = "/mountpoint",
+ extraConfigs = configs)
+ ```
+- When writing Hudi dataset, use abfss URL
+ ```scala
+ inputDF.write
+ .format("org.apache.hudi")
+ .options(opts)
+ .mode(SaveMode.Append)
+
.save("abfss://<<storage-account>>.dfs.core.windows.net/hudi-tables/customer")
+ ```
+- When reading Hudi dataset, use the mounting point
+ ```scala
+ spark.read
+ .format("org.apache.hudi")
+ .load("/mountpoint/hudi-tables/customer")
+ ```
diff --git a/docs/_docs/2_4_configurations.cn.md
b/docs/_docs/2_4_configurations.cn.md
index 1b1af05..a61ba9a 100644
--- a/docs/_docs/2_4_configurations.cn.md
+++ b/docs/_docs/2_4_configurations.cn.md
@@ -29,6 +29,10 @@ language: cn
S3和Hudi协同工作所需的配置。
* [Google Cloud Storage](/cn/docs/gcs_hoodie.html) <br/>
GCS和Hudi协同工作所需的配置。
+ * [Alibaba Cloud OSS](/cn/docs/oss_hoodie.html) <br/>
+ 阿里云和Hudi协同工作所需的配置。
+ * [Microsoft Azure](/cn/docs/azure_hoodie.html) <br/>
+ Azure和Hudi协同工作所需的配置。
## Spark数据源配置 {#spark-datasource}
diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index abb102b..b7ff7d1 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -26,6 +26,10 @@ to cloud stores.
Configurations required for S3 and Hudi co-operability.
* [Google Cloud Storage](/docs/gcs_hoodie) <br/>
Configurations required for GCS and Hudi co-operability.
+ * [Alibaba Cloud OSS](/docs/oss_hoodie.html) <br/>
+ Configurations required for OSS and Hudi co-operability.
+ * [Microsoft Azure](/docs/azure_hoodie.html) <br/>
+ Configurations required for Azure and Hudi co-operability.
## Spark Datasource Configs {#spark-datasource}