This is an automated email from the ASF dual-hosted git repository.
yuqi4733 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new 11854a910 [MINOR] docs: polish Fileset related document (#5639)
11854a910 is described below
commit 11854a910c480865f7ca751c57cbd5edccb0e946
Author: FANNG <[email protected]>
AuthorDate: Thu Nov 21 11:09:19 2024 +0800
[MINOR] docs: polish Fileset related document (#5639)
### What changes were proposed in this pull request?
polish Fileset related document
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
just document
---
docs/gravitino-server-config.md | 1 +
docs/hadoop-catalog.md | 56 ++++++++++++-------------
docs/manage-fileset-metadata-using-gravitino.md | 14 +++----
3 files changed, 36 insertions(+), 35 deletions(-)
diff --git a/docs/gravitino-server-config.md b/docs/gravitino-server-config.md
index b367c0ef2..35838393a 100644
--- a/docs/gravitino-server-config.md
+++ b/docs/gravitino-server-config.md
@@ -231,6 +231,7 @@ The following table lists the catalog specific properties
and their default path
| `jdbc-doris` | [Doris catalog
properties](jdbc-doris-catalog.md#catalog-properties) |
`catalogs/jdbc-doris/conf/jdbc-doris.conf` |
| `jdbc-oceanbase` | [OceanBase catalog
properties](jdbc-oceanbase-catalog.md#catalog-properties) |
`catalogs/jdbc-oceanbase/conf/jdbc-oceanbase.conf` |
| `kafka` | [Kafka catalog
properties](kafka-catalog.md#catalog-properties) |
`catalogs/kafka/conf/kafka.conf` |
+| `hadoop` | [Hadoop catalog
properties](hadoop-catalog.md#catalog-properties) |
`catalogs/hadoop/conf/hadoop.conf` |
:::info
The Gravitino server automatically adds the catalog properties configuration
directory to classpath.
diff --git a/docs/hadoop-catalog.md b/docs/hadoop-catalog.md
index f0fb9bb17..15f42883e 100644
--- a/docs/hadoop-catalog.md
+++ b/docs/hadoop-catalog.md
@@ -23,7 +23,7 @@ Hadoop 3. If there's any compatibility issue, please create
an [issue](https://g
### Catalog properties
-Besides the [common catalog
properties](./gravitino-server-config.md#gravitino-catalog-properties-configuration),
the Hadoop catalog has the following properties:
+Besides the [common catalog
properties](./gravitino-server-config.md#apache-gravitino-catalog-properties-configuration),
the Hadoop catalog has the following properties:
| Property Name | Description | Default
Value | Required | Since Version |
|---------------|-------------------------------------------------|---------------|----------|---------------|
@@ -33,46 +33,46 @@ Apart from the above properties, to access fileset like
HDFS, S3, GCS, OSS or cu
#### HDFS fileset
-| Property Name | Description
|
Default Value | Required |
Since Version |
-|----------------------------------------------------|------------------------------------------------------------------------------------------------|---------------|------------------------------------------------------------|----------------|
-| `authentication.impersonation-enable` | Whether to enable
impersonation for the Hadoop catalog. |
`false` | No |
0.5.1 |
-| `authentication.type` | The type of
authentication for Hadoop catalog, currently we only support `kerberos`,
`simple`. | `simple` | No
| 0.5.1 |
-| `authentication.kerberos.principal` | The principal of the
Kerberos authentication |
(none) | required if the value of `authentication.type` is Kerberos.|
0.5.1 |
-| `authentication.kerberos.keytab-uri` | The URI of The keytab
for the Kerberos authentication. |
(none) | required if the value of `authentication.type` is Kerberos.|
0.5.1 |
-| `authentication.kerberos.check-interval-sec` | The check interval of
Kerberos credential for Hadoop catalog. | 60
| No | 0.5.1
|
-| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60
| No | 0.5.1
|
+| Property Name | Description
|
Default Value | Required |
Since Version |
+|----------------------------------------------------|------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
+| `authentication.impersonation-enable` | Whether to enable
impersonation for the Hadoop catalog. |
`false` | No |
0.5.1 |
+| `authentication.type` | The type of
authentication for Hadoop catalog, currently we only support `kerberos`,
`simple`. | `simple` | No
| 0.5.1 |
+| `authentication.kerberos.principal` | The principal of the
Kerberos authentication |
(none) | required if the value of `authentication.type` is Kerberos. |
0.5.1 |
+| `authentication.kerberos.keytab-uri` | The URI of The keytab
for the Kerberos authentication. |
(none) | required if the value of `authentication.type` is Kerberos. |
0.5.1 |
+| `authentication.kerberos.check-interval-sec` | The check interval of
Kerberos credential for Hadoop catalog. | 60
| No | 0.5.1
|
+| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60
| No | 0.5.1
|
#### S3 fileset
-| Configuration item | Description
| Default value | Required | Since version
|
-|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------|------------------|
-| `filesystem-providers` | The file system providers to add. Set it to
`s3` if it's a S3 fileset, or a comma separated string that contains `s3` like
`gs,s3` to support multiple kinds of fileset including `s3`.
| (none) | Yes |
0.7.0-incubating |
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for S3, if we set this value, we can omit the prefix
's3a://' in the location.| `builtin-local` | No |
0.7.0-incubating |
-| `s3-endpoint` | The endpoint of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
-| `s3-access-key-id` | The access key of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
-| `s3-secret-access-key` | The secret key of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
+| Configuration item | Description
| Default value | Required | Since version
|
+|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------|------------------|
+| `filesystem-providers` | The file system providers to add. Set it to
`s3` if it's a S3 fileset, or a comma separated string that contains `s3` like
`gs,s3` to support multiple kinds of fileset including `s3`.
| (none) | Yes |
0.7.0-incubating |
+| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for S3, if we set this value, we can omit the prefix
's3a://' in the location. | `builtin-local` | No |
0.7.0-incubating |
+| `s3-endpoint` | The endpoint of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
+| `s3-access-key-id` | The access key of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
+| `s3-secret-access-key` | The secret key of the AWS S3.
| (none) | Yes if it's a S3 fileset. |
0.7.0-incubating |
At the same time, you need to place the corresponding bundle jar
[`gravitino-aws-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/aws-bundle/)
in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`.
#### GCS fileset
-| Configuration item | Description
| Default value | Required | Since version
|
-|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------|------------------|
-| `filesystem-providers` | The file system providers to add. Set it to
`gs` if it's a GCS fileset, a comma separated string that contains `gs` like
`gs,s3` to support multiple kinds of fileset including `gs`.
| (none) | Yes |
0.7.0-incubating |
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for GCS, if we set this value, we can omit the prefix
'gs://' in the location.| `builtin-local` | No |
0.7.0-incubating |
-| `gcs-service-account-file` | The path of GCS service account JSON file.
| (none) | Yes if it's a GCS fileset.| 0.7.0-incubating
|
+| Configuration item | Description
| Default value | Required | Since version
|
+|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------------------------|------------------|
+| `filesystem-providers` | The file system providers to add. Set it to
`gs` if it's a GCS fileset, a comma separated string that contains `gs` like
`gs,s3` to support multiple kinds of fileset including `gs`.
| (none) | Yes |
0.7.0-incubating |
+| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for GCS, if we set this value, we can omit the prefix
'gs://' in the location. | `builtin-local` | No |
0.7.0-incubating |
+| `gcs-service-account-file` | The path of GCS service account JSON file.
| (none) | Yes if it's a GCS fileset. |
0.7.0-incubating |
In the meantime, you need to place the corresponding bundle jar
[`gravitino-gcp-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/gcp-bundle/)
in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`.
#### OSS fileset
-| Configuration item | Description
| Default value | Required | Since version
|
-|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------|------------------|
-| `filesystem-providers` | The file system providers to add. Set it to
`oss` if it's a OSS fileset, or a comma separated string that contains `oss`
like `oss,gs,s3` to support multiple kinds of fileset including `oss`.
| (none) | Yes |
0.7.0-incubating |
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for OSS, if we set this value, we can omit the prefix
'oss://' in the location.| `builtin-local` | No |
0.7.0-incubating |
-| `oss-endpoint` | The endpoint of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset.|
0.7.0-incubating |
-| `oss-access-key-id` | The access key of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset.|
0.7.0-incubating |
-| `oss-secret-access-key` | The secret key of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset.|
0.7.0-incubating |
+| Configuration item | Description
| Default value | Required | Since version
|
+|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------------------------|------------------|
+| `filesystem-providers` | The file system providers to add. Set it to
`oss` if it's a OSS fileset, or a comma separated string that contains `oss`
like `oss,gs,s3` to support multiple kinds of fileset including `oss`.
| (none) | Yes |
0.7.0-incubating |
+| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for OSS, if we set this value, we can omit the prefix
'oss://' in the location. | `builtin-local` | No |
0.7.0-incubating |
+| `oss-endpoint` | The endpoint of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset. |
0.7.0-incubating |
+| `oss-access-key-id` | The access key of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset. |
0.7.0-incubating |
+| `oss-secret-access-key` | The secret key of the Aliyun OSS.
| (none) | Yes if it's a OSS fileset. |
0.7.0-incubating |
In the meantime, you need to place the corresponding bundle jar
[`gravitino-aliyun-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/aliyun-bundle/)
in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`.
diff --git a/docs/manage-fileset-metadata-using-gravitino.md
b/docs/manage-fileset-metadata-using-gravitino.md
index 7115e705d..9d96287b5 100644
--- a/docs/manage-fileset-metadata-using-gravitino.md
+++ b/docs/manage-fileset-metadata-using-gravitino.md
@@ -49,7 +49,7 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"comment": "comment",
"provider": "hadoop",
"properties": {
- "location": "file:/tmp/root"
+ "location": "file:///tmp/root"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs
@@ -83,7 +83,7 @@ GravitinoClient gravitinoClient = GravitinoClient
.build();
Map<String, String> properties = ImmutableMap.<String, String>builder()
- .put("location", "file:/tmp/root")
+ .put("location", "file:///tmp/root")
// Property "location" is optional. If specified, a managed fileset without
// a storage location will be stored under this location.
.build();
@@ -205,7 +205,7 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "schema",
"comment": "comment",
"properties": {
- "location": "file:/tmp/root/schema"
+ "location": "file:///tmp/root/schema"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas
```
@@ -227,7 +227,7 @@ SupportsSchemas supportsSchemas = catalog.asSchemas();
Map<String, String> schemaProperties = ImmutableMap.<String, String>builder()
// Property "location" is optional, if specified all the managed fileset
without
// specifying storage location will be stored under this location.
- .put("location", "file:/tmp/root/schema")
+ .put("location", "file:///tmp/root/schema")
.build();
Schema schema = supportsSchemas.createSchema("schema",
"This is a schema",
@@ -308,7 +308,7 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "example_fileset",
"comment": "This is an example fileset",
"type": "MANAGED",
- "storageLocation": "file:/tmp/root/schema/example_fileset",
+ "storageLocation": "file:///tmp/root/schema/example_fileset",
"properties": {
"k1": "v1"
}
@@ -335,7 +335,7 @@ filesetCatalog.createFileset(
NameIdentifier.of("schema", "example_fileset"),
"This is an example fileset",
Fileset.Type.MANAGED,
- "file:/tmp/root/schema/example_fileset",
+ "file:///tmp/root/schema/example_fileset",
propertiesMap,
);
```
@@ -373,7 +373,7 @@ when creating a fileset, or follow the rules of the
catalog/schema location if n
The value of `storageLocation` depends on the configuration settings of the
catalog:
- If this is a S3 fileset catalog, the `storageLocation` should be in the
format of `s3a://bucket-name/path/to/fileset`.
- If this is an OSS fileset catalog, the `storageLocation` should be in the
format of `oss://bucket-name/path/to/fileset`.
-- If this is a local fileset catalog, the `storageLocation` should be in the
format of `file:/path/to/fileset`.
+- If this is a local fileset catalog, the `storageLocation` should be in the
format of `file:///path/to/fileset`.
- If this is a HDFS fileset catalog, the `storageLocation` should be in the
format of `hdfs://namenode:port/path/to/fileset`.
- If this is a GCS fileset catalog, the `storageLocation` should be in the
format of `gs://bucket-name/path/to/fileset`.