(gravitino) branch branch-0.9 updated: [#6936] improvement(doc): update doc for fileset multiple locations (#7036)

jshao Tue, 22 Apr 2025 01:35:29 -0700

This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch branch-0.9
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/branch-0.9 by this push:
     new 52e9f6e002 [#6936] improvement(doc): update doc for fileset multiple 
locations (#7036)
52e9f6e002 is described below

commit 52e9f6e002c0199eba073c3ff3a9ca7832984d6c
Author: github-actions[bot] 
<41898282+github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Apr 22 16:35:13 2025 +0800

    [#6936] improvement(doc): update doc for fileset multiple locations (#7036)
    
    ### What changes were proposed in this pull request?
    
     update doc for fileset multiple locations
    
    ### Why are the changes needed?
    
    Fix: #6936
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    no need
    
    Co-authored-by: mchades <[email protected]>
    Co-authored-by: Qiming Teng <[email protected]>
---
 docs/hadoop-catalog.md                          |  44 ++--
 docs/how-to-use-gvfs.md                         |  53 ++++-
 docs/manage-fileset-metadata-using-gravitino.md | 257 +++++++++++++++++++++---
 3 files changed, 301 insertions(+), 53 deletions(-)

diff --git a/docs/hadoop-catalog.md b/docs/hadoop-catalog.md
index cea25e6490..cf57367950 100644
--- a/docs/hadoop-catalog.md
+++ b/docs/hadoop-catalog.md
@@ -25,7 +25,8 @@ Besides the [common catalog 
properties](./gravitino-server-config.md#apache-grav
 
 | Property Name                        | Description                           
                                                                                
                                                                                
                                                                                
                             | Default Value   | Required | Since Version    |
 
|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
-| `location`                           | The storage location managed by 
Hadoop catalog.                                                                 
                                                                                
                                                                                
                                   | (none)          | No       | 0.5.0         
   |
+| `location`                           | The storage location managed by 
Hadoop catalog. Its location name is `unknown`.                                 
                                                                                
                                                                                
                                   | (none)          | No       | 0.5.0         
   |
+| `location-`                          | The property prefix. User can use 
`location-{name}={path}` to set multiple locations with different names for the 
catalog.                                                                        
                                                                                
                                 | (none)          | No       | 
0.9.0-incubating |
 | `default-filesystem-provider`        | The default filesystem provider of 
this Hadoop catalog if users do not specify the scheme in the URI. Candidate 
values are 'builtin-local', 'builtin-hdfs', 's3', 'gcs', 'abs' and 'oss'. 
Default value is `builtin-local`. For S3, if we set this value to 's3', we can 
omit the prefix 's3a://' in the location. | `builtin-local` | No       | 
0.7.0-incubating |
 | `filesystem-providers`               | The file system providers to add. 
Users need to set this configuration to support cloud storage or custom HCFS. 
For instance, set it to `s3` or a comma separated string that contains `s3` 
like `gs,s3` to support multiple kinds of fileset including `s3`.               
                                       | (none)          | Yes      | 
0.7.0-incubating |
 | `credential-providers`               | The credential provider types, 
separated by comma.                                                             
                                                                                
                                                                                
                                    | (none)          | No       | 
0.8.0-incubating |
@@ -104,19 +105,27 @@ The Hadoop catalog supports creating, updating, deleting, 
and listing schema.
 
 ### Schema properties
 
-| Property name                         | Description                          
                                                                          | 
Default value             | Required | Since Version    |
-|---------------------------------------|----------------------------------------------------------------------------------------------------------------|---------------------------|----------|------------------|
-| `location`                            | The storage location managed by 
Hadoop schema.                                                                 
| (none)                    | No       | 0.5.0            |
-| `authentication.impersonation-enable` | Whether to enable impersonation for 
this schema of the Hadoop catalog.                                         | 
The parent(catalog) value | No       | 0.6.0-incubating |
-| `authentication.type`                 | The type of authentication for this 
schema of Hadoop catalog , currently we only support `kerberos`, `simple`. | 
The parent(catalog) value | No       | 0.6.0-incubating |
-| `authentication.kerberos.principal`   | The principal of the Kerberos 
authentication for this schema.                                                 
 | The parent(catalog) value | No       | 0.6.0-incubating |
-| `authentication.kerberos.keytab-uri`  | The URI of The keytab for the 
Kerberos authentication for this schema.                                        
 | The parent(catalog) value | No       | 0.6.0-incubating |
-| `credential-providers`                | The credential provider types, 
separated by comma.                                                             
| (none)                    | No       | 0.8.0-incubating |
+| Property name                         | Description                          
                                                                                
     | Default value             | Required | Since Version    |
+|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------|---------------------------|----------|------------------|
+| `location`                            | The storage location managed by 
Hadoop schema. Its location name is `unknown`.                                  
          | (none)                    | No       | 0.5.0            |
+| `location-`                           | The property prefix. User can use 
`location-{name}={path}` to set multiple locations with different names for the 
schema. | (none)                    | No       | 0.9.0-incubating |
+| `authentication.impersonation-enable` | Whether to enable impersonation for 
this schema of the Hadoop catalog.                                              
      | The parent(catalog) value | No       | 0.6.0-incubating |
+| `authentication.type`                 | The type of authentication for this 
schema of Hadoop catalog , currently we only support `kerberos`, `simple`.      
      | The parent(catalog) value | No       | 0.6.0-incubating |
+| `authentication.kerberos.principal`   | The principal of the Kerberos 
authentication for this schema.                                                 
            | The parent(catalog) value | No       | 0.6.0-incubating |
+| `authentication.kerberos.keytab-uri`  | The URI of The keytab for the 
Kerberos authentication for this schema.                                        
            | The parent(catalog) value | No       | 0.6.0-incubating |
+| `credential-providers`                | The credential provider types, 
separated by comma.                                                             
           | (none)                    | No       | 0.8.0-incubating |
 
 ### Schema operations
 
 Refer to [Schema 
operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for 
more details.
 
+:::note
+During schema creation or deletion, Gravitino automatically creates or removes 
the corresponding filesystem directories for the schema locations. 
+This behavior is skipped in either of these cases:
+1. When the catalog property `disable-filesystem-ops` is set to `true`
+2. When the location contains 
[placeholders](./manage-fileset-metadata-using-gravitino.md#placeholder)
+:::
+
 ## Fileset
 
 ### Fileset capabilities
@@ -125,14 +134,15 @@ Refer to [Schema 
operation](./manage-fileset-metadata-using-gravitino.md#schema-
 
 ### Fileset properties
 
-| Property name                         | Description                          
                                                                  | Default 
value            | Required | Immutable | Since Version    |
-|---------------------------------------|--------------------------------------------------------------------------------------------------------|--------------------------|----------|-----------|------------------|
-| `authentication.impersonation-enable` | Whether to enable impersonation for 
the Hadoop catalog fileset.                                        | The 
parent(schema) value | No       | Yes       | 0.6.0-incubating |
-| `authentication.type`                 | The type of authentication for 
Hadoop catalog fileset, currently we only support `kerberos`, `simple`. | The 
parent(schema) value | No       | No        | 0.6.0-incubating |
-| `authentication.kerberos.principal`   | The principal of the Kerberos 
authentication for the fileset.                                          | The 
parent(schema) value | No       | No        | 0.6.0-incubating |
-| `authentication.kerberos.keytab-uri`  | The URI of The keytab for the 
Kerberos authentication for the fileset.                                 | The 
parent(schema) value | No       | No        | 0.6.0-incubating |
-| `credential-providers`                | The credential provider types, 
separated by comma.                                                     | 
(none)                   | No       | No        | 0.8.0-incubating |
-| `placeholder-`                        | Properties that start with 
`placeholder-` are used to replace placeholders in the location.            | 
(none)                   | No       | Yes       | 0.9.0-incubating |
+| Property name                         | Description                          
                                                                                
| Default value                                                                 
                                 | Required                                   | 
Immutable | Since Version    |
+|---------------------------------------|----------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|--------------------------------------------|-----------|------------------|
+| `authentication.impersonation-enable` | Whether to enable impersonation for 
the Hadoop catalog fileset.                                                     
 | The parent(schema) value                                                     
                                  | No                                         
| Yes       | 0.6.0-incubating |
+| `authentication.type`                 | The type of authentication for 
Hadoop catalog fileset, currently we only support `kerberos`, `simple`.         
      | The parent(schema) value                                                
                                       | No                                     
    | No        | 0.6.0-incubating |
+| `authentication.kerberos.principal`   | The principal of the Kerberos 
authentication for the fileset.                                                 
       | The parent(schema) value                                               
                                        | No                                    
     | No        | 0.6.0-incubating |
+| `authentication.kerberos.keytab-uri`  | The URI of The keytab for the 
Kerberos authentication for the fileset.                                        
       | The parent(schema) value                                               
                                        | No                                    
     | No        | 0.6.0-incubating |
+| `credential-providers`                | The credential provider types, 
separated by comma.                                                             
      | (none)                                                                  
                                       | No                                     
    | No        | 0.8.0-incubating |
+| `placeholder-`                        | Properties that start with 
`placeholder-` are used to replace placeholders in the location.                
          | (none)                                                              
                                           | No                                 
        | Yes       | 0.9.0-incubating |
+| `default-location-name`               | The name of the default location of 
the fileset, mainly used for GVFS operations without specifying a location 
name. | When the fileset has only one location, its location name will be 
automatically selected as the default value. | Yes, if the fileset has multiple 
locations | Yes       | 0.9.0-incubating |
 
 Some properties are reserved and cannot be set by users:
 
diff --git a/docs/how-to-use-gvfs.md b/docs/how-to-use-gvfs.md
index 6b3e4a0a04..996041462b 100644
--- a/docs/how-to-use-gvfs.md
+++ b/docs/how-to-use-gvfs.md
@@ -138,6 +138,13 @@ two ways:
        ./gradlew :clients:filesystem-hadoop3-runtime:build -x test
     ```
 
+:::note
+For fileset with multiple locations, you can specify which location to access 
using one of these methods (in priority order):
+1. Set the `fs.gravitino.current.location.name` configuration property
+2. Export the environment variable `CURRENT_LOCATION_NAME`
+3. If neither is specified, the system will use the value of 
`default-location-name` from the fileset properties
+:::
+
 #### Via Hadoop shell command
 
 You can use the Hadoop shell command to perform operations on the fileset 
storage. For example:
@@ -145,6 +152,9 @@ You can use the Hadoop shell command to perform operations 
on the fileset storag
 ```shell
 # 1. Configure the hadoop `core-site.xml` configuration
 # You should put the required properties into this file
+
+# set the location name if you want to access a specific location
+# export CURRENT_LOCATION_NAME=${the_fileset_location_name}
 vi ${HADOOP_HOME}/etc/hadoop/core-site.xml
 
 # 2. Place the GVFS runtime jar into your Hadoop environment
@@ -172,6 +182,8 @@ 
conf.set("fs.AbstractFileSystem.gvfs.impl","org.apache.gravitino.filesystem.hado
 
conf.set("fs.gvfs.impl","org.apache.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem");
 conf.set("fs.gravitino.server.uri","http://localhost:8090";);
 conf.set("fs.gravitino.client.metalake","test_metalake");
+// set the location name if you want to access a specific location
+// conf.set("fs.gravitino.current.location.name","test_location_name");
 Path filesetPath = new 
Path("gvfs://fileset/test_catalog/test_schema/test_fileset_1");
 FileSystem fs = filesetPath.getFileSystem(conf);
 fs.getFileStatus(filesetPath);
@@ -199,6 +211,8 @@ fs.getFileStatus(filesetPath);
     --conf 
spark.hadoop.fs.gvfs.impl=org.apache.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem
     --conf spark.hadoop.fs.gravitino.server.uri=${your_gravitino_server_uri}
     --conf spark.hadoop.fs.gravitino.client.metalake=${your_gravitino_metalake}
+    # set the location name if you want to access a specific location
+    # --conf 
spark.hadoop.fs.gravitino.current.location.name=${the_fileset_location_name}
     ```
 
 3. Perform operations on the fileset storage in your code.
@@ -236,6 +250,8 @@ For Tensorflow to support GVFS, you need to recompile the 
[tensorflow-io](https:
    ```shell
    export HADOOP_HOME=${your_hadoop_home}
    export HADOOP_CONF_DIR=${your_hadoop_conf_home}
+   # set the location name if you want to access a specific location
+   # export CURRENT_LOCATION_NAME=${the_fileset_location_name} 
    export PATH=$PATH:$HADOOP_HOME/libexec/hadoop-config.sh
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
@@ -378,6 +394,13 @@ Gravitino python client does not support [customized file 
systems](hadoop-catalo
 
 ### Usage examples
 
+:::note
+For fileset with multiple locations, you can specify which location to access 
using one of these methods (in priority order):
+1. Set the `current_location_name` configuration property
+2. Export the environment variable `CURRENT_LOCATION_NAME`
+3. If neither is specified, the system will use the value of 
`default-location-name` from the fileset properties
+:::
+
 1. Make sure to obtain the Gravitino library.
    You can get it by [pip](https://pip.pypa.io/en/stable/installation/):
 
@@ -407,6 +430,13 @@ Gravitino python client does not support [customized file 
systems](hadoop-catalo
       <name>hadoop.client.keytab.file</name>
       <value>/tmp/xxx.keytab</value>
     </property>
+   
+   <!-- Optional, if you want to access a specific location -->
+   <property>
+      <name>fs.gravitino.current.location.name</name>
+      <value>location-name</value>
+   </property>
+  
     # Configure Hadoop env in Linux
     export HADOOP_HOME=${YOUR_HADOOP_PATH}
     export HADOOP_CONF_DIR=${YOUR_HADOOP_PATH}/etc/hadoop
@@ -423,7 +453,11 @@ For example:
 from gravitino import gvfs
 
 # init the gvfs
-fs = gvfs.GravitinoVirtualFileSystem(server_uri="http://localhost:8090";, 
metalake_name="test_metalake")
+fs = gvfs.GravitinoVirtualFileSystem(
+   server_uri="http://localhost:8090";,
+   metalake_name="test_metalake",
+   # set the location name if you want to access a specific location
+   options={"current_location_name": "the_location_name"})
 
 # list file infos under the fileset
 fs.ls(path="gvfs://fileset/fileset_catalog/tmp/tmp_fileset/sub_dir")
@@ -514,7 +548,10 @@ import pyarrow.dataset as dt
 import pyarrow.parquet as pq
 
 fs = gvfs.GravitinoVirtualFileSystem(
-    server_uri="http://localhost:8090";, metalake_name="test_metalake"
+    server_uri="http://localhost:8090";,
+    metalake_name="test_metalake",
+    # set the location name if you want to access a specific location
+    options={"current_location_name": "the_location_name"}
 )
 
 # read a parquet file as arrow dataset
@@ -531,7 +568,10 @@ from gravitino import gvfs
 import ray
 
 fs = gvfs.GravitinoVirtualFileSystem(
-    server_uri="http://localhost:8090";, metalake_name="test_metalake"
+    server_uri="http://localhost:8090";,
+    metalake_name="test_metalake",
+    # set the location name if you want to access a specific location
+    options={"current_location_name": "the_location_name"},
 )
 
 # read a parquet file as ray dataset
@@ -544,7 +584,12 @@ ds = 
ray.data.read_parquet("gvfs://fileset/fileset_catalog/tmp/tmp_fileset/test.
 from gravitino import gvfs
 from llama_index.core import SimpleDirectoryReader
 
-fs = gvfs.GravitinoVirtualFileSystem(server_uri=server_uri, 
metalake_name=metalake_name)
+fs = gvfs.GravitinoVirtualFileSystem(
+   server_uri=server_uri,
+   metalake_name=metalake_name,
+   # set the location name if you want to access a specific location
+   options={"current_location_name": "the_location_name"},
+)
 
 # read all document files like csv files under the fileset sub dir
 reader = SimpleDirectoryReader(
diff --git a/docs/manage-fileset-metadata-using-gravitino.md 
b/docs/manage-fileset-metadata-using-gravitino.md
index d4fde25a1d..12bb9a64c3 100644
--- a/docs/manage-fileset-metadata-using-gravitino.md
+++ b/docs/manage-fileset-metadata-using-gravitino.md
@@ -247,7 +247,7 @@ same.
 
 You can create a fileset by sending a `POST` request to the 
`/api/metalakes/{metalake_name}
 /catalogs/{catalog_name}/schemas/{schema_name}/filesets` endpoint or just use 
the Gravitino Java
-client. The following is an example of creating a fileset:
+client. The following is an example of creating a fileset with single storage 
location:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -315,16 +315,45 @@ Currently, Gravitino supports two **types** of filesets:
    specified as `EXTERNAL`, the files of the fileset will **not** be deleted 
when
    the fileset is dropped.
 
-**storageLocation**
+:::note
+During fileset creation or deletion, Gravitino automatically creates or 
removes the corresponding filesystem directories for the fileset locations.
+This behavior is skipped in either of these cases:
+1. When the catalog property `disable-filesystem-ops` is set to `true`
+2. When the location contains 
[placeholders](./manage-fileset-metadata-using-gravitino.md#placeholder)
+:::
+
+#### storageLocation
 
 The `storageLocation` is the physical location of the fileset. Users can 
specify this location
 when creating a fileset, or follow the rules of the catalog/schema location if 
not specified.
 
+For a `MANAGED` fileset, the storage location is determined in the following 
priority order:
+
+1. If the user specifies `storageLocation` during fileset creation:
+   - This location is used, with any [placeholders](#placeholder) replaced by 
the corresponding fileset property values.
+
+2. If the user doesn't specify `storageLocation`:
+   - If schema property `location` is specified:
+      - Use `<schema location>/<fileset name>` if schema location has no 
placeholders
+      - Use `<schema location>` with placeholders replaced by fileset property 
values
+
+   - Otherwise, if catalog property `location` is specified:
+      - Use `<catalog location>/<schema name>/<fileset name>` if catalog 
location has no placeholders
+      - Use `<catalog location>` with placeholders replaced by fileset 
property values
+
+   - If neither schema nor catalog location is specified:
+      - The user must provide `storageLocation` during fileset creation
+
+For an `EXTERNAL` fileset, the user must always specify `storageLocation` 
during fileset creation. 
+If the provided location contains placeholders, they will be replaced by the 
corresponding fileset property values.
+
+#### placeholder
+
 The `storageLocation` in each level can contain **placeholders**, format as 
`{{name}}`, which will
 be replaced by the corresponding fileset property value when the fileset 
object is created. The
 placeholder property in the fileset object is formed as 
"placeholder-{{name}}". For example, if
-the `storageLocation` is `file:///tmp/{{schema}}-{{fileset}}-{{verion}}`, and 
the fileset object 
-named "catalog1.schema1.fileset1" contains the properties 
`placeholder-version=v1`, 
+the `storageLocation` is `file:///tmp/{{schema}}-{{fileset}}-{{verion}}`, and 
the fileset object
+named "catalog1.schema1.fileset1" contains the properties 
`placeholder-version=v1`,
 the actual `storageLocation` will be `file:///tmp/schema1-fileset1-v1`.
 
 The following is an example of creating a fileset with placeholders in the 
`storageLocation`:
@@ -429,34 +458,198 @@ 
catalog.as_fileset_catalog().create_fileset(ident=NameIdentifier.of("test_schema
 </TabItem>
 </Tabs>
 
-The value of `storageLocation` depends on the configuration settings of the 
catalog:
-- If this is a local fileset catalog, the `storageLocation` should be in the 
format of `file:///path/to/fileset`.
-- If this is a HDFS fileset catalog, the `storageLocation` should be in the 
format of `hdfs://namenode:port/path/to/fileset`.
-
-For a `MANAGED` fileset, the storage location is:
-
-1. The one specified by the user during the fileset creation, and the 
placeholder will be replaced by the
-   corresponding fileset property value.
-2. When the catalog property `location` is specified but the schema property 
`location` isn't specified, the storage location is:
-   1. `catalog location/schema name/fileset name` if `catalog location` does 
not contain any placeholder. 
-   2. `catalog location` - placeholders in the catalog location will be 
replaced by the corresponding fileset property value.
-
-3. When the catalog property `location` isn't specified but the schema 
property `location` is specified,
-   the storage location is:
-   1. `schema location/fileset name` if `schema location` does not contain any 
placeholder.
-   2. `schema location` - placeholders in the schema location will be replaced 
by the corresponding fileset property value.
-   
-4. When both the catalog property `location` and the schema property 
`location` are specified, the storage
-   location is:
-   1. `schema location/fileset name` if `schema location` does not contain any 
placeholder.
-   2. `schema location` - placeholders in the schema location will be replaced 
by the corresponding fileset property value.
-
-5. When both the catalog property `location` and schema property `location` 
isn't specified, the user
-   should specify the `storageLocation` in the fileset creation.
-
-For `EXTERNAL` fileset, users should specify `storageLocation` during the 
fileset creation,
-otherwise, Gravitino will throw an exception. If the `storageLocation` 
contains placeholders, the
-placeholder will be replaced by the corresponding fileset property value.
+#### storageLocations
+You can also create a fileset with multiple storage locations. The 
`storageLocations` is a map of location name to storage location.
+The generation rules of each location follow the generation rules of a single 
location.
+The following is an example of creating a fileset with multiple storage 
locations:
+
+<Tabs groupId="language" queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+# create a catalog first
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "test_catalog",
+  "type": "FILESET",
+  "comment": "comment",
+  "provider": "hadoop",
+  "properties": {
+    "filesystem-providers": "builtin-local,builtin-hdfs,s3,gcs",
+    "location-l1": 
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+    "location-l2": 
"hdfs:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"
+  }
+}' http://localhost:8090/api/metalakes/metalake/catalogs
+
+# create a schema under the catalog
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "test_schema",
+  "comment": "comment",
+  "properties": {
+    "location-l3": 
"s3a://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"
+  }
+}' http://localhost:8090/api/metalakes/metalake/catalogs/test_catalog/schemas
+
+# create a fileset by placeholders
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "example_fileset",
+  "comment": "This is an example fileset",
+  "type": "MANAGED",
+  "storageLocations": {
+    "l4": "gs://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"
+  },
+  "properties": {
+    "placeholder-project": "test_project",
+    "placeholder-user": "test_user",
+    "default-location-name": "l1"
+  }
+}' 
http://localhost:8090/api/metalakes/metalake/catalogs/test_catalog/schemas/test_schema/filesets
+
+# the fileset will be created with 4 storage locations:
+{
+  "name": "example_fileset",
+  "comment": "This is an example fileset",
+  "type": "MANAGED",
+  "storageLocation": null,
+  "storageLocations": {
+    "l1": "file:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l2": "hdfs:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l3": 
"s3a://myBucket/test_catalog/test_schema/workspace_test_project/test_user",
+    "l4": 
"gs://myBucket/test_catalog/test_schema/workspace_test_project/test_user"
+  },
+  "properties": {
+    "placeholder-project": "test_project",
+    "placeholder-user": "test_user",
+    "default-location-name": "l1"
+  }
+}
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+GravitinoClient gravitinoClient = GravitinoClient
+    .builder("http://localhost:8090";)
+    .withMetalake("metalake")
+    .build();
+// create a catalog first
+Catalog catalog = gravitinoClient.createCatalog(
+    "test_catalog",
+    Type.FILESET,
+    "hadoop", // provider
+    "comment",
+    ImmutableMap.of(
+        "filesystem-providers", "builtin-local,builtin-hdfs,s3,gcs",
+        "location-l1", 
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+        "location-l2", 
"hdfs:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"));
+FilesetCatalog filesetCatalog = catalog.asFilesetCatalog();
+
+// create a schema under the catalog
+filesetCatalog.createSchema(
+    "test_schema",
+    "comment",
+    ImmutableMap.of("location-l3", 
"s3a://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"));
+
+// create a fileset by placeholders
+filesetCatalog.createMultipleLocationFileset(
+  NameIdentifier.of("test_schema", "example_fileset"),
+  "This is an example fileset",
+  Fileset.Type.MANAGED,
+  ImmutableMap.of("l4", 
"gs://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"),
+  ImmutableMap.of(
+      "placeholder-project", "test_project", 
+      "placeholder-user", "test_user",
+      "default-location-name", "l1")
+);
+
+// the fileset will be created with 4 storage locations:
+{
+  "name": "example_fileset",
+  "comment": "This is an example fileset",
+  "type": "MANAGED",
+  "storageLocation": null,
+  "storageLocations": {
+    "l1": "file:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l2": "hdfs:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l3": 
"s3a://myBucket/test_catalog/test_schema/workspace_test_project/test_user",
+    "l4": 
"gs://myBucket/test_catalog/test_schema/workspace_test_project/test_user"
+  },
+  "properties": {
+    "placeholder-project": "test_project",
+    "placeholder-user": "test_user",
+    "default-location-name": "l1"
+  }
+}
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+gravitino_client: GravitinoClient = 
GravitinoClient(uri="http://localhost:8090";, metalake_name="metalake")
+
+# create a catalog first
+catalog: Catalog = gravitino_client.create_catalog(
+   name="test_catalog",
+   catalog_type=Catalog.Type.FILESET,
+   provider="hadoop",
+   comment="comment",
+   properties={
+      "filesystem-providers": "builtin-local,builtin-hdfs,s3,gcs",
+      "location-l1": 
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+      "location-l2": 
"hdfs:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+    }
+)
+
+# create a schema under the catalog
+catalog.as_schemas().create_schema(
+   name="test_schema",
+   comment="comment",
+   properties={
+      "location-l3": 
"s3a://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+   }
+)
+
+# create a fileset by placeholders
+catalog.as_fileset_catalog().create_multiple_location_fileset(
+    ident=NameIdentifier.of("test_schema", "example_fileset"),
+    type=Fileset.Type.MANAGED,
+    comment="This is an example fileset",
+    storage_locations={
+        "l4": 
"gs://myBucket/{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
+    },
+    roperties={
+       "placeholder-project": "test_project",
+       "placeholder-user": "test_user",
+       "default-location-name": "l1",
+    }
+)
+
+# the fileset will be created with 4 storage locations:
+{
+  "name": "example_fileset",
+  "comment": "This is an example fileset",
+  "type": "MANAGED",
+  "storageLocation": null,
+  "storageLocations": {
+    "l1": "file:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l2": "hdfs:///test_catalog/test_schema/workspace_test_project/test_user",
+    "l3": 
"s3a://myBucket/test_catalog/test_schema/workspace_test_project/test_user",
+    "l4": 
"gs://myBucket/test_catalog/test_schema/workspace_test_project/test_user"
+  },
+  "properties": {
+    "placeholder-project": "test_project",
+    "placeholder-user": "test_user",
+    "default-location-name": "l1"
+  }
+}
+```
+
+</TabItem>
+</Tabs>
 
 ### Alter a fileset

(gravitino) branch branch-0.9 updated: [#6936] improvement(doc): update doc for fileset multiple locations (#7036)

Reply via email to