yuqi1129 commented on code in PR #8777: URL: https://github.com/apache/gravitino/pull/8777#discussion_r2532435517
########## docs/fileset-catalog.md: ########## @@ -21,6 +21,120 @@ Note that Gravitino uses Hadoop 3 dependencies to build Fileset catalog. Theoret compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in Hadoop 3. If there's any compatibility issue, please create an [issue](https://github.com/apache/gravitino/issues). +In general, the locations of all schemas and filesets under a fileset +catalog belong to a single Hadoop cluster if they are HDFS location. + +The example for creating a fileset is as follows: +```text +# create fileset catalog +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "fileset_catalog", + "type": "FILESET", + "comment": "This is a fileset catalog", + "provider": "fileset", + "properties": { + "location": "hdfs://172.17.0.2:9000/fileset_catalog" + } +}' http://localhost:8090/api/metalakes/test/catalogs + +# create a fileset schema under the catalog with inherited properties +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "test_schema", + "comment": "This is a schema", + "properties": { + } +}' http://localhost:8090/api/metalakes/test/catalogs/fileset_catalog/schemas + +# create a fileset under the schema with inherited properties +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" +-H "Content-Type: application/json" -d '{ + "name": "fs1", + "comment": "This is an example fileset", + "type": "MANAGED", + "properties": { + } +}' http://localhost:8090/api/metalakes/test/catalogs/fileset_catalog/schemas/test_schema/filesets +``` + +Within a fileset catalog, schemas and filesets can automatically inherit configuration properties +from their parent catalog. For example, the location property can be inherited — a schema can inherit +the catalog’s location as its base path, and a fileset can in turn inherit the schema’s location as its base path. + +The property inheritance priority is as follows: catalog < schema < fileset. + +If a fileset needs to use a different storage path, it can specify its own location configuration to +override the inherited one. + +The fileset catalog also supports multiple clusters. Each schema and fileset under a catalog can independently +specify their own cluster locations and connection configurations. + +For example, a complex catalog structure might look like this: + +```text +catalog1 -> hdfs://cluster1/catalog1 + schema1 -> hdfs://cluster1/catalog1/schema1 + fileset1 -> hdfs://cluster1/catalog1/schema1/fileset1 + fileset2 -> hdfs://cluster1/catalog1/schema1/fileset2 + schema2 -> hdfs://cluster2/tmp/schema2 + fileset3 -> hdfs://cluster2/tmp/schema2/fsd + fileset4 -> hdfs://cluster3/customers +``` + +In this example, the default location of catalog1 is hdfs://cluster1/catalog1. +schema1 and its filesets are stored in the same cluster as defined by the catalog (cluster1). +However, schema2 and its filesets (fileset3, fileset4) are located in different clusters (cluster2 and cluster3, respectively). + + +The example for creating Filesets with different clusteris as follows: Review Comment: typo: clusteris -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
