Re: [PR] [#5005] improvement(docs): Add a document about how to create Hive S3 tables through Gravitino. [gravitino]

via GitHub Wed, 25 Sep 2024 20:17:43 -0700


FANNG1 commented on code in PR #5006:
URL: https://github.com/apache/gravitino/pull/5006#discussion_r1776298935



##########
docs/hive-catalog-with-s3.md:
##########
@@ -0,0 +1,206 @@
+---
+title: "Hive catalog with s3"
+slug: /hive-catalog
+date: 2024-9-24
+keyword: Hive catalog cloud storage S3
+license: "This software is licensed under the Apache License version 2."
+---
+
+
+## Introduction
+Since Hive 2.x, Hive has supported S3 as a storage backend, enabling users to 
store and manage data in Amazon S3 directly through Hive. Gravitino enhances 
this capability by supporting the Hive catalog with S3, allowing users to 
efficiently manage the storage locations of files located in S3. This 
integration simplifies data operations and enables seamless access to S3 data 
from Hive queries.
+
+The following sections will guide you through the necessary steps to configure 
the Hive catalog to utilize S3 as a storage backend, including configuration 
details and examples for creating databases and tables.
+
+## Hive cluster configuration
+To use the Hive catalog with S3, you must configure your Hive cluster to 
recognize S3 as a storage backend. The following example illustrates the 
required changes in the `hive-site.xml` configuration file:
+
+### Example Configuration Changes
+Below are the essential properties to add or modify in the `hive-site.xml` 
file to support S3:
+
+```xml
+<property>
+  <name>fs.s3a.access.key</name>
+  <value>S3_ACCESS_KEY_ID</value>
+</property>
+
+<property>
+  <name>fs.s3a.secret.key</name>
+  <value>S3_SECRET_KEY_ID</value>
+</property>
+
+<property>
+  <name>fs.s3a.endpoint</name>
+  <value>S3_ENDPOINT_ID</value>
+</property>
+
+<property>
+  <name>fs.s3a.aws.credentials.provider</name>
+  
<value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,com.amazonaws.auth.EnvironmentVariableCredentialsProvider,org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</value>
+</property>
+```
+
+### Adding Required JARs
+
+
+After updating the `hive-site.xml`, you need to ensure that the necessary 
S3-related JARs are included in the Hive classpath. You can do this by 
executing the following command:
+```shell
+cp ${HADOOP_HOME}/share/hadoop/tools/lib/*aws* ${HIVE_HOME}/lib
+```
+Alternatively, you can download the required JARs from the Maven repository 
and place them in the Hive classpath. It is crucial to verify that the JARs are 
compatible with the version of Hadoop you are using to avoid any compatibility 
issue.
+
+### Restart Hive Cluster
+Once all configurations have been correctly set, restart the Hive cluster to 
apply the changes. This step is essential to ensure that the new configurations 
take effect and that the Hive services can communicate with S3.
+
+
+## Creating Tables or Databases with S3 Storage using Gravitino
+
+Assuming you have already set up a Hive catalog with Gravitino, you can 
proceed to create tables or databases using S3 storage. For more information on 
catalog operations, refer to [Catalog 
operations](./manage-fileset-metadata-using-gravitino.md#catalog-operations)
+
+### Example: Creating a Database with S3 Storage

Review Comment:
   add blank line here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#5005] improvement(docs): Add a document about how to create Hive S3 tables through Gravitino. [gravitino]

Reply via email to