FANNG1 commented on code in PR #5006: URL: https://github.com/apache/gravitino/pull/5006#discussion_r1777888636
########## docs/hive-catalog-with-s3.md: ########## @@ -0,0 +1,210 @@ +--- +title: "Hive catalog with s3" +slug: /hive-catalog +date: 2024-9-24 +keyword: Hive catalog cloud storage S3 +license: "This software is licensed under the Apache License version 2." +--- + + +## Introduction +Since Hive 2.x, Hive has supported S3 as a storage backend, enabling users to store and manage data in Amazon S3 directly through Hive. Gravitino enhances this capability by supporting the Hive catalog with S3, allowing users to efficiently manage the storage locations of files located in S3. This integration simplifies data operations and enables seamless access to S3 data from Hive queries. + +The following sections will guide you through the necessary steps to configure the Hive catalog to utilize S3 as a storage backend, including configuration details and examples for creating databases and tables. + +## Hive metastore configuration +To use the Hive catalog with S3, you must configure your Hive metastore to recognize S3 as a storage backend. The following example illustrates the required changes in the `hive-site.xml` configuration file: + +### Example Configuration Changes +Below are the essential properties to add or modify in the `hive-site.xml` file to support S3: + +```xml + Review Comment: add `hive.metastore.warehouse.dir` to the example configuration? after setting this to a s3 location, user doesn't need to specify s3 location explicitly when create schema or table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
