mchades commented on code in PR #8777: URL: https://github.com/apache/gravitino/pull/8777#discussion_r2538536833
########## docs/fileset-catalog.md: ########## Review Comment: Shall you also update the docs `fileset-catalog-with-s3.md`, `fileset-catalog-with-oss.md`, `fileset-catalog-with-gcs.md` and `fileset-catalog-with-azure.md`? ########## docs/fileset-catalog.md: ########## @@ -21,6 +21,120 @@ Note that Gravitino uses Hadoop 3 dependencies to build Fileset catalog. Theoret compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in Hadoop 3. If there's any compatibility issue, please create an [issue](https://github.com/apache/gravitino/issues). +In general, the locations of all schemas and filesets under a fileset +catalog belong to a single Hadoop cluster if they are HDFS location. + +The example for creating a fileset is as follows: +```text +# create fileset catalog +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "fileset_catalog", + "type": "FILESET", + "comment": "This is a fileset catalog", + "provider": "fileset", + "properties": { + "location": "hdfs://172.17.0.2:9000/fileset_catalog" + } +}' http://localhost:8090/api/metalakes/test/catalogs + +# create a fileset schema under the catalog with inherited properties +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ +-H "Content-Type: application/json" -d '{ + "name": "test_schema", + "comment": "This is a schema", + "properties": { + } +}' http://localhost:8090/api/metalakes/test/catalogs/fileset_catalog/schemas + +# create a fileset under the schema with inherited properties +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" +-H "Content-Type: application/json" -d '{ + "name": "fs1", + "comment": "This is an example fileset", + "type": "MANAGED", + "properties": { + } +}' http://localhost:8090/api/metalakes/test/catalogs/fileset_catalog/schemas/test_schema/filesets +``` Review Comment: Is this part duplicated with that one? https://github.com/apache/gravitino/blob/9526257c3d4c104c81e3ce31fc1ef485e75ae5e7/docs/manage-fileset-metadata-using-gravitino.md?plain=1#L247-L324 Maybe you should move this part to `manage-fileset-metadata-using-gravitino.md`, since it describes the usage of filesets. ########## docs/how-to-use-gvfs.md: ########## @@ -48,31 +48,32 @@ the path mapping and convert automatically. ### Configuration -| Configuration item | Description | Default value | Required | Since version | -|-------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|-------------------------------------|------------------| -| `fs.AbstractFileSystem.gvfs.impl` | The Gravitino Virtual File System abstract class, set it to `org.apache.gravitino.filesystem.hadoop.Gvfs`. | (none) | Yes | 0.5.0 | -| `fs.gvfs.impl` | The Gravitino Virtual File System implementation class, set it to `org.apache.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem`. | (none) | Yes | 0.5.0 | -| `fs.gvfs.impl.disable.cache` | Disable the Gravitino Virtual File System cache in the Hadoop environment. If you need to proxy multi-user operations, please set this value to `true` and create a separate File System for each user. | `false` | No | 0.5.0 | -| `fs.gravitino.server.uri` | The Gravitino server URI which GVFS needs to load the fileset metadata. | (none) | Yes | 0.5.0 | -| `fs.gravitino.client.metalake` | The metalake to which the fileset belongs. | (none) | Yes | 0.5.0 | -| `fs.gravitino.client.authType` | The auth type to initialize the Gravitino client to use with the Gravitino Virtual File System. Currently only supports `simple`, `oauth2` and `kerberos` auth types. | `simple` | No | 0.5.0 | -| `fs.gravitino.client.oauth2.serverUri` | The auth server URI for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | -| `fs.gravitino.client.oauth2.credential` | The auth credential for the Gravitino client when using `oauth2` auth type in the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | -| `fs.gravitino.client.oauth2.path` | The auth server path for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. Please remove the first slash `/` from the path, for example `oauth/token`. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | -| `fs.gravitino.client.oauth2.scope` | The auth scope for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | -| `fs.gravitino.client.kerberos.principal` | The auth principal for the Gravitino client when using `kerberos` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `kerberos` auth type | 0.5.1 | -| `fs.gravitino.client.kerberos.keytabFilePath` | The auth keytab file path for the Gravitino client when using `kerberos` auth type in the Gravitino Virtual File System. | (none) | No | 0.5.1 | -| `fs.gravitino.fileset.cache.maxCapacity` | The cache capacity of the Gravitino Virtual File System. | `20` | No | 0.5.0 | -| `fs.gravitino.fileset.cache.evictionMillsAfterAccess` | The value of time that the cache expires after accessing in the Gravitino Virtual File System. The value is in `milliseconds`. | `3600000` | No | 0.5.0 | -| `fs.gravitino.current.location.name` | The configuration used to select the location of the fileset. If this configuration is not set, the value of environment variable configured by `fs.gravitino.current.location.env.var` will be checked. If neither is set, the value of fileset property `default-location-name` will be used as the location name. | the value of fileset property `default-location-name` | No | 0.9.0-incubating | -| `fs.gravitino.current.location.name.env.var` | The environment variable name to get the current location name. | `CURRENT_LOCATION_NAME` | No | 0.9.0-incubating | -| `fs.gravitino.operations.class` | The operations class to provide the FS operations for the Gravitino Virtual File System. Users can extends `BaseGVFSOperations` to implement their own operations and configure the class name in this conf to use custom FS operations. | `org.apache.gravitino.filesystem.hadoop.DefaultGVFSOperations` | No | 0.9.0-incubating | -| `fs.gravitino.hook.class` | The hook class to inject into the <br/>Gravitino Virtual File System. Users can implement their own `GravitinoVirtualFileSystemHook` and configure the class name in this conf to inject custom code. | `org.apache.gravitino.filesystem.hadoop.NoOpHook` | No | 0.9.0-incubating | -| `fs.gravitino.client.request.header.` | The configuration key prefix for the Gravitino client request header. You can set the request header for the Gravitino client. | (none) | No | 0.9.0-incubating | -| `fs.gravitino.enableCredentialVending` | Whether to enable credential vending for the Gravitino Virtual File System. | `false` | No | 0.9.0-incubating | -| `fs.gravitino.client.` | The configuration key prefix for the Gravitino client config. | (none) | No | 1.0.0 | -| `fs.gravitino.filesetMetadataCache.enable` | Whether to cache the fileset or fileset catalog metadata in the Gravitino Virtual File System. Note that this cache causes a side effect: if you modify the fileset or fileset catalog metadata, the client can not see the latest changes. | `false` | No | 1.0.0 | -| `fs.gravitino.autoCreateLocation` | The configuration key for whether to enable auto-creation of fileset location when the server-side filesystem ops are disabled and the location does not exist. | `true` | No | 1.1.0 | +| Configuration item | Description | Default value | Required | Since version | +|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|-------------------------------------|------------------| +| `fs.AbstractFileSystem.gvfs.impl` | The Gravitino Virtual File System abstract class, set it to `org.apache.gravitino.filesystem.hadoop.Gvfs`. | (none) | Yes | 0.5.0 | +| `fs.gvfs.impl` | The Gravitino Virtual File System implementation class, set it to `org.apache.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem`. | (none) | Yes | 0.5.0 | +| `fs.gvfs.impl.disable.cache` | Disable the Gravitino Virtual File System cache in the Hadoop environment. If you need to proxy multi-user operations, please set this value to `true` and create a separate File System for each user. | `false` | No | 0.5.0 | +| `fs.gravitino.server.uri` | The Gravitino server URI which GVFS needs to load the fileset metadata. | (none) | Yes | 0.5.0 | +| `fs.gravitino.client.metalake` | The metalake to which the fileset belongs. | (none) | Yes | 0.5.0 | +| `fs.gravitino.client.authType` | The auth type to initialize the Gravitino client to use with the Gravitino Virtual File System. Currently only supports `simple`, `oauth2` and `kerberos` auth types. | `simple` | No | 0.5.0 | +| `fs.gravitino.client.oauth2.serverUri` | The auth server URI for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | +| `fs.gravitino.client.oauth2.credential` | The auth credential for the Gravitino client when using `oauth2` auth type in the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | +| `fs.gravitino.client.oauth2.path` | The auth server path for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. Please remove the first slash `/` from the path, for example `oauth/token`. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | +| `fs.gravitino.client.oauth2.scope` | The auth scope for the Gravitino client when using `oauth2` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `oauth2` auth type | 0.5.0 | +| `fs.gravitino.client.kerberos.principal` | The auth principal for the Gravitino client when using `kerberos` auth type with the Gravitino Virtual File System. | (none) | Yes if you use `kerberos` auth type | 0.5.1 | +| `fs.gravitino.client.kerberos.keytabFilePath` | The auth keytab file path for the Gravitino client when using `kerberos` auth type in the Gravitino Virtual File System. | (none) | No | 0.5.1 | +| `fs.gravitino.fileset.cache.maxCapacity` | The cache capacity of the Gravitino Virtual File System. | `20` | No | 0.5.0 | +| `fs.gravitino.fileset.cache.evictionMillsAfterAccess` | The value of time that the cache expires after accessing in the Gravitino Virtual File System. The value is in `milliseconds`. | `3600000` | No | 0.5.0 | +| `fs.gravitino.current.location.name` | The configuration used to select the location of the fileset. If this configuration is not set, the value of environment variable configured by `fs.gravitino.current.location.env.var` will be checked. If neither is set, the value of fileset property `default-location-name` will be used as the location name. | the value of fileset property `default-location-name` | No | 0.9.0-incubating | +| `fs.gravitino.current.location.name.env.var` | The environment variable name to get the current location name. | `CURRENT_LOCATION_NAME` | No | 0.9.0-incubating | +| `fs.gravitino.operations.class` | The operations class to provide the FS operations for the Gravitino Virtual File System. Users can extends `BaseGVFSOperations` to implement their own operations and configure the class name in this conf to use custom FS operations. | `org.apache.gravitino.filesystem.hadoop.DefaultGVFSOperations` | No | 0.9.0-incubating | +| `fs.gravitino.hook.class` | The hook class to inject into the <br/>Gravitino Virtual File System. Users can implement their own `GravitinoVirtualFileSystemHook` and configure the class name in this conf to inject custom code. | `org.apache.gravitino.filesystem.hadoop.NoOpHook` | No | 0.9.0-incubating | +| `fs.gravitino.client.request.header.` | The configuration key prefix for the Gravitino client request header. You can set the request header for the Gravitino client. | (none) | No | 0.9.0-incubating | +| `fs.gravitino.enableCredentialVending` | Whether to enable credential vending for the Gravitino Virtual File System. | `false` | No | 0.9.0-incubating | +| `fs.gravitino.client.` | The configuration key prefix for the Gravitino client config. | (none) | No | 1.0.0 | +| `fs.gravitino.filesetMetadataCache.enable` | Whether to cache the fileset ,fileset schema or fileset catalog metadata in the Gravitino Virtual File System. Note that this cache causes a side effect: if you modify the fileset or fileset catalog metadata, the client can not see the latest changes. | `false` | No | 1.0.0 | +| `fs.gravitino.autoCreateLocation` | The configuration key for whether to enable auto-creation of fileset location when the server-side filesystem ops are disabled and the location does not exist. | `true` | No | 1.1.0 | +| `fs.gravitino.fileset.properties.<identifiler>.<property_key>` | The custom properties defined in the fileset, schema or catalog. Users can set these properties to configure the behavior of GVFS when accessing the fileset, schema and catalog. identifier may be <catalog>, <catalog>.<schema> and <catalog>.<schema>.<fileset>. <property_key> is the key name of then entity properties. | (none) | No | 1.1.0 | Review Comment: Why do we need to configure the fileset property? The fileset properties should be determined when the fileset is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
