samredai commented on a change in pull request #3686: URL: https://github.com/apache/iceberg/pull/3686#discussion_r769888398
########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following Review comment: I don't think you need to list the engines and it'll be easy to forget to come update this in the future ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: Review comment: I would remove "For example" here since there's no previous sentence. Or maybe replace it with "The following is an example of..." ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish Review comment: This part is a bit hard to read. Maybe this Engine Access section can be condensed to something like: > All engines in aliyun services can access iceberg tables. Although the Iceberg and DLF integration has not been released yet, here are some examples of how you can manage Iceberg tables in a Hive catalog. ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 Review comment: Should this be `0.13.0`? If this is known to work with `0.12.1` maybe we should use that here and update it to `0.13.0` once it's released. ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. Review comment: nit: since the example is just creating one table this should be "Let's create an iceberg table and insert a few records into it using Spark-SQL." ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE TABLE my_catalog.default.sample ( + id BIGINT, + data STRING +) +USING iceberg +TBLPROPERTIES ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +### Flink + +Take the sample that accessing apache iceberg tables stored in aliyun object storage service with Apache Flink 1.13.2: Review comment: I would reword this as: > The following is an example of working with apache iceberg tables stored in Aliyun OSS with Apache Flink. ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE TABLE my_catalog.default.sample ( + id BIGINT, + data STRING +) +USING iceberg +TBLPROPERTIES ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +### Flink + +Take the sample that accessing apache iceberg tables stored in aliyun object storage service with Apache Flink 1.13.2: + +```bash +ICEBERG_VERSION=0.13.0 +wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar Review comment: Looks like `ICEBERG_MAVEN_URL` needs to be set at the top of this code block. ``` MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg ``` ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ Review comment: Should `HOST`, `PORT`, and `WAREHOUSE_PATH` be set as variables similar to `ALIYUN_ACCESS_KEY_ID`? That may make it more obvious that these need to be replaced here. ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE TABLE my_catalog.default.sample ( + id BIGINT, + data STRING +) +USING iceberg +TBLPROPERTIES ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +### Flink + +Take the sample that accessing apache iceberg tables stored in aliyun object storage service with Apache Flink 1.13.2: + +```bash +ICEBERG_VERSION=0.13.0 +wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar + +./bin/sql-client.sh embedded \ + -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \ + -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \ + -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \ + shell +``` + +Let's create iceberg tables and insert few records into it. Review comment: Same as above, I would make this: "Let's create an iceberg table and insert a few records into it using Flink-SQL." ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE TABLE my_catalog.default.sample ( + id BIGINT, + data STRING +) +USING iceberg +TBLPROPERTIES ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +### Flink + +Take the sample that accessing apache iceberg tables stored in aliyun object storage service with Apache Flink 1.13.2: + +```bash +ICEBERG_VERSION=0.13.0 +wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar + +./bin/sql-client.sh embedded \ + -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \ + -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \ + -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \ + shell +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE CATALOG hive WITH ( + 'type' = 'iceberg', + 'uri' = 'thrift://localhost:9083', + 'warehouse' = 'oss://my-bucket/my-object', Review comment: The way placeholders are displayed here is inconsistent. How about using the following? ``` <OSS_PATH> <THRIFT-URI> <OSS_ENDPOINT> <ALIYUN_ACCESS_KEY_ID> <ALIYUN_ACCESS_KEY_SECRET> ``` ########## File path: site/docs/aliyun.md ########## @@ -0,0 +1,145 @@ +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# Iceberg Aliyun Integrations + +Iceberg provides integration with different Aliyun services through the `iceberg-aliyun` module. +This section describes how to use Iceberg with [Aliyun](https://www.alibabacloud.com/). + +## Enabling Aliyun Integration + +The `iceberg-aliyun` module integrates alibaba cloud services (Aliyun OSS, Aliyun DLF etc) with apache iceberg. +Currently, it provides the bundled `iceberg-aliyun-runtime` module for users to access the iceberg table backed in +alibaba cloud services. To enable the aliyun integration, people only need to ensure that the `iceberg-aliyun-runtime` +jar is loaded into classpath correctly by the engines such as Spark, Flink, Hive, Presto etc. + +## Catalogs + +[Aliyun DLF](https://www.aliyun.com/product/bigdata/dlf) is a core service from Alibaba Cloud that satisfies users' +needs for data asset management while creating data lake tables. DLF provides unified metadata views and permission +management for data available in OSS ([Aliyun Object Storage Service](https://www.alibabacloud.com/product/object-storage-service)). +It also provides real-time lake migration and cleaning templates for data and production-level metadata services for +upper-layer data analysis engines. Aliyun DLF is a good choice to manage the apache iceberg tables, the Aliyun DLF +catalog integration will come in the next releases. + +### Engines Access. + +All the engines (Spark, Hive, Flink, Presto) can access the iceberg table backed in aliyun services. There are following +examples to show how to access it. + +The ideal way to show the example is using Aliyun DLF Catalog to manage those iceberg tables, but we still don't finish +the iceberg + DLF integration work in apache iceberg repository. Here we are showing the examples to manage iceberg +tables in Hive Catalog. + +### Spark + +For example, to access apache iceberg tables stored in alibaba object storage service with Apache Spark 3.2.x: + +```bash +# Add Iceberg dependency +ICEBERG_VERSION=0.13.0 +ALIYUN_ACCESS_KEY_ID=****** # Your Aliyun access key id. +ALIYUN_ACCESS_KEY_SECRET=****** # Your Aliyun access key secret. +ALIYUN_OSS_ENDPOINT=****** # Your Aliyun OSS endpoint. + +DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:$ICEBERG_VERSION" +DEPENDENCIES+=",org.apache.iceberg:iceberg-aliyun-runtime:$ICEBERG_VERSION" + +# Start Spark SQL client shell +spark-sql --packages $DEPENDENCIES \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.my_catalog.uri=thrift://<host>:<port> \ + --conf spark.sql.catalog.my_catalog.warehouse=oss://my-bucket/my/key/prefix \ + --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aliyun.oss.OSSFileIO \ + --conf spark.sql.catalog.my_catalog.oss.endpoint=$ALIYUN_OSS_ENDPOINT \ + --conf spark.sql.catalog.my_catalog.client.access-key-id=$ALIYUN_ACCESS_KEY_ID \ + --conf spark.sql.catalog.my_catalog.client.access-key-secret=$ALIYUN_ACCESS_KEY_SECRET +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE TABLE my_catalog.default.sample ( + id BIGINT, + data STRING +) +USING iceberg +TBLPROPERTIES ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +### Flink + +Take the sample that accessing apache iceberg tables stored in aliyun object storage service with Apache Flink 1.13.2: + +```bash +ICEBERG_VERSION=0.13.0 +wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar + +./bin/sql-client.sh embedded \ + -j /path/to/flink-sql-connector-hive-2.3.6_2.12-1.13.2.jar \ + -j /path/to/iceberg-aliyun-runtime-$ICEBERG_VERSION.jar \ + -j /path/to/iceberg-flink-1.13-runtime-$ICEBERG_VERSION.jar \ + shell +``` + +Let's create iceberg tables and insert few records into it. + +```sql +CREATE CATALOG hive WITH ( + 'type' = 'iceberg', + 'uri' = 'thrift://localhost:9083', + 'warehouse' = 'oss://my-bucket/my-object', + 'io-impl' = 'org.apache.iceberg.aliyun.oss.OSSFileIO', + 'oss.endpoint' = '<your-oss-endpoint-address>', + 'client.access-key-id' = '<your-aliyun-access-key>', + 'client.access-key-secret' = '<your-aliyun-access-secret>' +); + +CREATE TABLE `hive`.`default`.`sample` ( + id BIGINT, + data STRING +) WITH ( + 'engine.hive.enabled' = 'true' +); + +INSERT INTO `hive`.`default`.`sample` VALUES (1, 'AAA'); +``` + +## Aliyun Integration Tests + +To verify all the `iceberg-aliyun` features work fine with the integration tests, we can follow the below scripts to test: Review comment: I think this sentence would be clearer if shortened to: > The following script can be used to run the `iceberg-aliyun` integration tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
