[GitHub] [iceberg] jackye1995 commented on a change in pull request #1891: AWS: documentation page for AWS module

GitBox Mon, 21 Dec 2020 20:45:22 -0800


jackye1995 commented on a change in pull request #1891:
URL: https://github.com/apache/iceberg/pull/1891#discussion_r547065129




##########
File path: site/docs/aws.md
##########
@@ -0,0 +1,212 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+ 
+# Iceberg AWS Integrations
+
+Iceberg provides integration with different AWS services through the 
`iceberg-aws` module. 
+This section describes how to use Iceberg with AWS.
+
+## Runtime Packages
+
+The first thing to note is that the `iceberg-aws` module is not bundled with 
any engine runtime.
+To use any features described in later sections, you need to include the 
following packages by yourself:
+
+* the [iceberg AWS 
package](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws)
+* the [AWS SDK 
bundle](https://mvnrepository.com/artifact/software.amazon.awssdk/bundle), or 
individual AWS client packages if you would like to have a minimum dependency 
footprint. (please note that we use the new AWS v2 SDK instead of v1)
+
+For example, in Spark 3, you can start the SQL shell with:
+
+```sh
+spark-sql --packages 
org.apache.iceberg:iceberg-spark3-runtime:0.11.0,org.apache.iceberg:iceberg-aws-runtime:0.11.0,software.amazon.awssdk:bundle:2.15.40
 \
+    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
+    --conf 
spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
 \
+    --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my-key-prefix 
\
+    --conf spark.sql.catalog.my_catalog.gluecatalog.lock.table=myGlueLockTable
+```
+
+## Glue Catalog
+
+Iceberg enables the use of [AWS Glue](https://aws.amazon.com/glue) as the 
`Catalog` implementation.
+When used, an Iceberg `Namespace` is stored as a [Glue 
Database](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-databases.html),
 
+an Iceberg `Table` is stored as a [Glue 
Table](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html),
+an Iceberg `Snapshot` is stored as a [Glue 
TableVersion](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion).
 
+You can start using Glue catalog by specifying the `catalog-impl` as 
`org.apache.iceberg.aws.glue.GlueCatalog`. 
+More details about loading the catalog can be found in individual engine 
pages, such as [Spark](../spark/#loading-a-custom-catalog) and 
[Flink](../flink/#creating-catalogs-and-using-catalogs).
+
+### Glue Catalog ID
+
+It is very common for an organization to store all the tables in a single Glue 
catalog in a single AWS account and run data computation in many different 
accounts. 
+In this case, you need to specify a Glue catalog ID when initializing 
`GlueCatalog`.
+The Glue catalog ID you should use is the AWS account ID.
+This is because in each AWS account, there is a single Glue catalog in each 
AWS region,
+but the region is pre-determined by the Glue web client that is making the 
call.
+If you would like to access a Glue catalog in a different region, you should 
configure you AWS client, see more details in [AWS client 
configuration](#aws-client-configurations).

Review comment:
       I see, please read if the new version is easier to understand.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on a change in pull request #1891: AWS: documentation page for AWS module

Reply via email to