[GitHub] [iceberg] samredai commented on a change in pull request #3686: Aliyun: Add iceberg-aliyun document

GitBox Wed, 15 Dec 2021 10:49:54 -0800


samredai commented on a change in pull request #3686:
URL: https://github.com/apache/iceberg/pull/3686#discussion_r769888398




##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following

Review comment:
       I don't think you need to list the engines and it'll be easy to forget 
to come update this in the future

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:

Review comment:
       I would remove "For example" here since there's no previous sentence. Or 
maybe replace it with "The following is an example of..."

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish

Review comment:
       This part is a bit hard to read. Maybe this Engine Access section can be 
condensed to something like:
   
   > All engines in aliyun services can access iceberg tables. Although the 
Iceberg and DLF integration has not been released yet, here are some examples 
of how you can manage Iceberg tables in a Hive catalog.

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0

Review comment:
       Should this be `0.13.0`? If this is known to work with `0.12.1` maybe we 
should use that here and update it to `0.13.0` once it's released.

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.

Review comment:
       nit: since the example is just creating one table this should be "Let's 
create an iceberg table and insert a few records into it using Spark-SQL."

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE TABLE my_catalog.default.sample (
+    id    BIGINT,
+    data  STRING
+)
+USING iceberg
+TBLPROPERTIES (
+  'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+### Flink
+
+Take the sample that accessing apache iceberg tables stored in aliyun object 
storage service with Apache Flink 1.13.2: 

Review comment:
       I would reword this as:
   > The following is an example of working with apache iceberg tables stored 
in Aliyun OSS with Apache Flink.

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE TABLE my_catalog.default.sample (
+    id    BIGINT,
+    data  STRING
+)
+USING iceberg
+TBLPROPERTIES (
+  'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+### Flink
+
+Take the sample that accessing apache iceberg tables stored in aliyun object 
storage service with Apache Flink 1.13.2: 
+
+```bash
+ICEBERG_VERSION=0.13.0
+wget 
$ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar

Review comment:
       Looks like `ICEBERG_MAVEN_URL` needs to be set at the top of this code 
block.
   ```
   MAVEN_URL=https://repo1.maven.org/maven2
   ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
   ```

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \

Review comment:
       Should `HOST`, `PORT`, and `WAREHOUSE_PATH` be set as variables similar 
to `ALIYUN_ACCESS_KEY_ID`? That may make it more obvious that these need to be 
replaced here.

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE TABLE my_catalog.default.sample (
+    id    BIGINT,
+    data  STRING
+)
+USING iceberg
+TBLPROPERTIES (
+  'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+### Flink
+
+Take the sample that accessing apache iceberg tables stored in aliyun object 
storage service with Apache Flink 1.13.2: 
+
+```bash
+ICEBERG_VERSION=0.13.0
+wget 
$ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
+
+./bin/sql-client.sh embedded \
+  -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \
+  -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \
+  -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \
+  shell
+```
+
+Let's create iceberg tables and insert few records into it.

Review comment:
       Same as above, I would make this: "Let's create an iceberg table and 
insert a few records into it using Flink-SQL."

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE TABLE my_catalog.default.sample (
+    id    BIGINT,
+    data  STRING
+)
+USING iceberg
+TBLPROPERTIES (
+  'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+### Flink
+
+Take the sample that accessing apache iceberg tables stored in aliyun object 
storage service with Apache Flink 1.13.2: 
+
+```bash
+ICEBERG_VERSION=0.13.0
+wget 
$ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
+
+./bin/sql-client.sh embedded \
+  -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \
+  -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \
+  -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \
+  shell
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE CATALOG hive WITH (
+    'type' = 'iceberg',
+    'uri' = 'thrift://localhost:9083',
+    'warehouse' = 'oss://my-bucket/my-object',

Review comment:
       The way placeholders are displayed here is inconsistent. How about using 
the following?
   ```
   <OSS_PATH>
   <THRIFT-URI>
   <OSS_ENDPOINT>
   <ALIYUN_ACCESS_KEY_ID>
   <ALIYUN_ACCESS_KEY_SECRET>
   ```

##########
File path: site/docs/aliyun.md
##########
@@ -0,0 +1,145 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Aliyun Integrations
+
+Iceberg provides integration with different Aliyun services through the 
`iceberg-aliyun` module.
+This section describes how to use Iceberg with 
[Aliyun](https://www.alibabacloud.com/).
+
+## Enabling Aliyun Integration
+
+The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, 
Aliyun DLF etc) with apache iceberg.
+Currently, it provides the bundled `iceberg-aliyun-runtime` module for users 
to access the iceberg table backed in
+alibaba cloud services. To enable the aliyun integration, people only need to 
ensure that the `iceberg-aliyun-runtime`
+jar is loaded into classpath correctly by the engines such as Spark, Flink, 
Hive, Presto etc.
+
+## Catalogs
+
+[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service 
from Alibaba Cloud that satisfies users'
+needs for data asset management while creating data lake tables. DLF provides 
unified metadata views and permission
+management for data available in OSS ([Aliyun Object Storage 
Service](https://www.alibabacloud.com/product/object-storage-service)).
+It also provides real-time lake migration and cleaning templates for data and 
production-level metadata services for
+upper-layer data analysis engines. Aliyun DLF is a good choice to manage the 
apache iceberg tables, the Aliyun DLF
+catalog integration will come in the next releases.
+
+### Engines Access.
+
+All the engines (Spark, Hive, Flink, Presto) can access the iceberg table 
backed in aliyun services. There are following
+examples to show how to access it.
+
+The ideal way to show the example is using Aliyun DLF Catalog to manage those 
iceberg tables, but we still don't finish
+the iceberg + DLF integration work in apache iceberg repository. Here we are 
showing the examples to manage iceberg
+tables in Hive Catalog.
+
+### Spark
+
+For example, to access apache iceberg tables stored in alibaba object storage 
service with Apache Spark 3.2.x:
+
+```bash
+# Add Iceberg dependency
+ICEBERG_VERSION=0.13.0
+ALIYUN_ACCESS_KEY_ID=******      # Your Aliyun access key id.
+ALIYUN_ACCESS_KEY_SECRET=******  # Your Aliyun access key secret.
+ALIYUN_OSS_ENDPOINT=******       # Your Aliyun OSS endpoint.
+
+DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION"
+DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION"
+
+# Start Spark SQL client shell
+spark-sql --packages $DEPENDENCIES \
+    --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \
+    --conf 
spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \
+    --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \
+    --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \
+    --conf 
spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE TABLE my_catalog.default.sample (
+    id    BIGINT,
+    data  STRING
+)
+USING iceberg
+TBLPROPERTIES (
+  'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+### Flink
+
+Take the sample that accessing apache iceberg tables stored in aliyun object 
storage service with Apache Flink 1.13.2: 
+
+```bash
+ICEBERG_VERSION=0.13.0
+wget 
$ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
+
+./bin/sql-client.sh embedded \
+  -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \
+  -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \
+  -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \
+  shell
+```
+
+Let's create iceberg tables and insert few records into it.
+
+```sql
+CREATE CATALOG hive WITH (
+    'type' = 'iceberg',
+    'uri' = 'thrift://localhost:9083',
+    'warehouse' = 'oss://my-bucket/my-object',
+    'io-impl' = 'org.apache.iceberg.aliyun.oss.OSSFileIO',
+    'oss.endpoint' = '<your-oss-endpoint-address>',
+    'client.access-key-id' = '<your-aliyun-access-key>',
+    'client.access-key-secret' = '<your-aliyun-access-secret>'
+);
+
+CREATE TABLE `hive`.`default`.`sample` (
+    id   BIGINT,
+    data STRING
+) WITH (
+    'engine.hive.enabled' = 'true'
+);
+
+INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA');
+```
+
+## Aliyun Integration Tests
+
+To verify all the `iceberg-aliyun` features work fine with the integration 
tests, we can follow the below scripts to test:

Review comment:
       I think this sentence would be clearer if shortened to: 
   > The following script can be used to run the `iceberg-aliyun` integration 
tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] samredai commented on a change in pull request #3686: Aliyun: Add iceberg-aliyun document

Reply via email to