mchades commented on code in PR #9569:
URL: https://github.com/apache/gravitino/pull/9569#discussion_r2655105252


##########
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogImpl.java:
##########
@@ -46,7 +46,7 @@ public String shortName() {
 
   @Override
   protected CatalogOperations newOps(Map<String, String> config) {
-    return new SecureFilesetCatalogOperations();
+    return new FilesetCatalogOperations();

Review Comment:
   could you plz help to review the auth parts? @jerqi @yuqi1129 



##########
catalogs/catalog-common/src/main/java/org/apache/gravitino/utils/FilesetUtil.java:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.utils;
+
+import com.google.common.base.Preconditions;
+import java.net.URI;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+
+/** Utility class for fileset-related operations. */
+public class FilesetUtil {
+
+  private FilesetUtil() {}
+
+  /**
+   * Get user-defined configurations for a given path based on the path's base 
location
+   * (scheme://authority).
+   *
+   * <p>The logic:
+   *
+   * <ol>
+   *   <li>Extract baseLocation (scheme://authority) from the given path
+   *   <li>Find config entries like "fs.path.config.&lt;name&gt; = 
&lt;base_location&gt;" where the
+   *       base_location matches the extracted baseLocation
+   *   <li>Extract the name from the matching entry
+   *   <li>Then find all config entries with prefix 
"fs.path.config.&lt;name&gt;." and extract
+   *       properties
+   * </ol>
+   *
+   * <p>Example:
+   *
+   * <pre>
+   *   fs.path.config.cluster1 = s3://bucket1
+   *   fs.path.config.cluster1.aws-access-key = XXX1
+   *   fs.path.config.cluster1.aws-secret-key = XXX2
+   *   If path is "s3://bucket1/path/fileset1", then baseLocation is 
"s3://bucket1",
+   *   cluster1 matches and we extract:
+   *   - aws-access-key = XXX1
+   *   - aws-secret-key = XXX2
+   * </pre>
+   *
+   * @param path the path to extract configurations for
+   * @param conf the configuration map containing path-based configurations
+   * @param configPrefix the prefix for path-based configurations (e.g., 
"fs.path.config.")
+   * @return a map of user-defined configurations for the given path
+   */
+  public static Map<String, String> getUserDefinedFileSystemConfigs(
+      URI path, Map<String, String> conf, String configPrefix) {
+    Preconditions.checkArgument(path != null, "Path should not be null");
+    Preconditions.checkArgument(conf != null, "Configuration map should not be 
null");
+    Preconditions.checkArgument(
+        StringUtils.isNotBlank(configPrefix), "Config prefix should not be 
blank");
+
+    Map<String, String> properties = new HashMap<>();
+    String baseLocation = getBaseLocation(path);
+    String locationName = null;
+
+    // First pass: find the location name by matching baseLocation
+    // Look for entries like "fs.path.config.<name> = <base_location>"
+    // The key format should be exactly "fs.path.config.<name>" (no dot after 
name)
+    for (Map.Entry<String, String> entry : conf.entrySet()) {
+      String key = entry.getKey();
+      String value = entry.getValue();
+
+      // Check if this is a location definition: "fs.path.config.<name> = 
<base_location>"
+      // The key format should be exactly "fs.path.config.<name>" (no dot 
after name)
+      if (key.startsWith(configPrefix)) {
+        String suffix = key.substring(configPrefix.length());
+        // Check if this is a location definition (no dot after the name)
+        // Format: "fs.path.config.<name>" (not 
"fs.path.config.<name>.<property>")
+        if (!suffix.contains(".")
+            && StringUtils.isNotBlank(suffix)
+            && StringUtils.isNotBlank(value)) {
+          // This is a location definition: "fs.path.config.<name>"
+          // Extract baseLocation from the value and compare with the path's 
baseLocation
+          try {
+            URI valueUri = new URI(value);
+            String valueBaseLocation = getBaseLocation(valueUri);
+            if (baseLocation.equals(valueBaseLocation)) {
+              locationName = suffix;
+              break;
+            }
+          } catch (Exception e) {
+            // Skip invalid URI values
+            continue;

Review Comment:
   Should we throw an error to indicate that the URI is invalid?



##########
catalogs/catalog-common/src/main/java/org/apache/gravitino/utils/FilesetUtil.java:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.utils;
+
+import com.google.common.base.Preconditions;
+import java.net.URI;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+
+/** Utility class for fileset-related operations. */
+public class FilesetUtil {
+
+  private FilesetUtil() {}
+
+  /**
+   * Get user-defined configurations for a given path based on the path's base 
location
+   * (scheme://authority).
+   *
+   * <p>The logic:
+   *
+   * <ol>
+   *   <li>Extract baseLocation (scheme://authority) from the given path
+   *   <li>Find config entries like "fs.path.config.&lt;name&gt; = 
&lt;base_location&gt;" where the
+   *       base_location matches the extracted baseLocation
+   *   <li>Extract the name from the matching entry
+   *   <li>Then find all config entries with prefix 
"fs.path.config.&lt;name&gt;." and extract
+   *       properties
+   * </ol>
+   *
+   * <p>Example:
+   *
+   * <pre>
+   *   fs.path.config.cluster1 = s3://bucket1
+   *   fs.path.config.cluster1.aws-access-key = XXX1
+   *   fs.path.config.cluster1.aws-secret-key = XXX2
+   *   If path is "s3://bucket1/path/fileset1", then baseLocation is 
"s3://bucket1",
+   *   cluster1 matches and we extract:
+   *   - aws-access-key = XXX1
+   *   - aws-secret-key = XXX2
+   * </pre>
+   *
+   * @param path the path to extract configurations for
+   * @param conf the configuration map containing path-based configurations
+   * @param configPrefix the prefix for path-based configurations (e.g., 
"fs.path.config.")
+   * @return a map of user-defined configurations for the given path
+   */
+  public static Map<String, String> getUserDefinedFileSystemConfigs(
+      URI path, Map<String, String> conf, String configPrefix) {
+    Preconditions.checkArgument(path != null, "Path should not be null");
+    Preconditions.checkArgument(conf != null, "Configuration map should not be 
null");
+    Preconditions.checkArgument(
+        StringUtils.isNotBlank(configPrefix), "Config prefix should not be 
blank");
+
+    Map<String, String> properties = new HashMap<>();
+    String baseLocation = getBaseLocation(path);
+    String locationName = null;
+
+    // First pass: find the location name by matching baseLocation
+    // Look for entries like "fs.path.config.<name> = <base_location>"
+    // The key format should be exactly "fs.path.config.<name>" (no dot after 
name)
+    for (Map.Entry<String, String> entry : conf.entrySet()) {
+      String key = entry.getKey();
+      String value = entry.getValue();
+
+      // Check if this is a location definition: "fs.path.config.<name> = 
<base_location>"
+      // The key format should be exactly "fs.path.config.<name>" (no dot 
after name)
+      if (key.startsWith(configPrefix)) {
+        String suffix = key.substring(configPrefix.length());
+        // Check if this is a location definition (no dot after the name)
+        // Format: "fs.path.config.<name>" (not 
"fs.path.config.<name>.<property>")
+        if (!suffix.contains(".")
+            && StringUtils.isNotBlank(suffix)
+            && StringUtils.isNotBlank(value)) {
+          // This is a location definition: "fs.path.config.<name>"
+          // Extract baseLocation from the value and compare with the path's 
baseLocation
+          try {
+            URI valueUri = new URI(value);
+            String valueBaseLocation = getBaseLocation(valueUri);
+            if (baseLocation.equals(valueBaseLocation)) {
+              locationName = suffix;
+              break;
+            }
+          } catch (Exception e) {
+            // Skip invalid URI values
+            continue;
+          }
+        }
+      }
+    }
+
+    // Second pass: extract all properties for the matched location name
+    if (locationName != null) {
+      String propertyPrefix = configPrefix + locationName + ".";
+      for (Map.Entry<String, String> entry : conf.entrySet()) {
+        String key = entry.getKey();
+        // Check if this key is a property for the matched location
+        // e.g., "fs.path.config.cluster1.aws-ak" matches prefix 
"fs.path.config.cluster1."
+        if (key.startsWith(propertyPrefix)) {
+          // Extract the property name after the location prefix
+          // e.g., "fs.path.config.cluster1.aws-ak" -> "aws-ak"
+          String propertyName = key.substring(propertyPrefix.length());
+          if (!propertyName.isEmpty()) {
+            properties.put(propertyName, entry.getValue());
+          }
+        }
+      }
+    }
+
+    return properties;
+  }
+
+  /**
+   * Extract base location (scheme://authority) from a URI.
+   *
+   * @param uri the URI to extract base location from
+   * @return the base location in format "scheme://authority"
+   */
+  private static String getBaseLocation(URI uri) {
+    return uri.getScheme() + "://" + uri.getAuthority();

Review Comment:
   Why is this comment marked as resolved?



##########
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogOperations.java:
##########
@@ -166,19 +173,14 @@ static class FileSystemCacheKey {
     // authority are both null
     @Nullable private final String scheme;
     @Nullable private final String authority;
-    private final Map<String, String> conf;
+    @Nullable private String locationName;
     private final String currentUser;
 
-    FileSystemCacheKey(String scheme, String authority, Map<String, String> 
conf) {
+    FileSystemCacheKey(String scheme, String authority, String locationName) {

Review Comment:
   This new cache key will cause locations with the same `locationName` under 
different filesets to hit the same cache. Is this expected?



##########
docs/manage-fileset-metadata-using-gravitino.md:
##########
@@ -921,3 +921,43 @@ curl -X POST -H "Accept: 
application/vnd.gravitino.v1+json" \
 }' 
http://localhost:8090/api/metalakes/test/catalogs/fileset_catalog/schemas/test_schema/filesets
 ```
 
+Alternatively, you can pass configurations for multiple clusters using the 
`fs.path.config.{cluster_name}.*` format when creating the fileset catalog.
+All schemas and filesets under the catalog can inherit the configuration based 
on the cluster name.
+The system can automatically retrieve the appropriate configuration from the 
fileset catalog based on the fileset's storage location.
+
+```text
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "fileset_catalog",
+  "type": "FILESET",
+  "comment": "This is a fileset catalog",
+  "provider": "fileset",
+  "properties": {
+    "location": "hdfs://172.17.0.2:9000/fileset_catalog",
+    "fs.path.config.hadoop1"="hdfs://172.17.0.2:9000",
+    
"fs.path.config.hadoop1.config.resources"="/etc/conf/cluster1/core-site.xml,/etc/conf/cluster1/hdfs-site.xml",
+    "fs.path.config.aws1"="s3://mybuctket",
+    "fs.path.config.aws1.access.key"="your_access_key",
+    "fs.path.config.aws1.secret.key"="your_secret_key"
+  }
+}' http://localhost:8090/api/metalakes/test/catalogs

Review Comment:
   plz add a simple comment for the code block



##########
docs/fileset-catalog.md:
##########
@@ -28,18 +28,19 @@ Hadoop 3. If there's any compatibility issue, please create 
an [issue](https://g
 Besides the [common catalog 
properties](./gravitino-server-config.md#apache-gravitino-catalog-properties-configuration),
 the Fileset catalog has the following properties:
 
-| Property Name                        | Description                           
                                                                                
                                                                                
                                                                                
                              | Default Value   | Required | Since Version    |
-|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
-| `location`                           | The storage location managed by 
Fileset catalog. Its location name is `unknown`. The value should always a 
directory(HDFS) or path prefix(cloud storage like S3, GCS.) and does not 
support a single file.                                                          
                                                | (none)          | No       | 
0.5.0            |
-| `location-`                          | The property prefix. User can use 
`location-{name}={path}` to set multiple locations with different names for the 
catalog.                                                                        
                                                                                
                                  | (none)          | No       | 
0.9.0-incubating |
-| `default-filesystem-provider`        | The default filesystem provider of 
this Fileset catalog if users do not specify the scheme in the URI. Candidate 
values are 'builtin-local', 'builtin-hdfs', 's3', 'gcs', 'abs' and 'oss'. 
Default value is `builtin-local`. For S3, if we set this value to 's3', we can 
omit the prefix 's3a://' in the location. | `builtin-local` | No       | 
0.7.0-incubating |
-| `filesystem-providers`               | The file system providers to add. 
Users need to set this configuration to support cloud storage or custom HCFS. 
For instance, set it to `s3` or a comma separated string that contains `s3` 
like `gs,s3` to support multiple kinds of fileset including `s3`.               
                                        | (none)          | Yes      | 
0.7.0-incubating |
-| `credential-providers`               | The credential provider types, 
separated by comma.                                                             
                                                                                
                                                                                
                                     | (none)          | No       | 
0.8.0-incubating |
-| `filesystem-conn-timeout-secs`       | The timeout of getting the file 
system using Hadoop FileSystem client instance. Time unit: seconds.             
                                                                                
                                                                                
                                    | 6               | No       | 
0.8.0-incubating |
-| `disable-filesystem-ops`             | The configuration to disable file 
system operations in the server side. If set to true, the Fileset catalog in 
the server side will not create, drop files or folder when the schema, fileset 
is created, dropped.                                                            
                                      | false           | No       | 
0.9.0-incubating |
-| `fileset-cache-eviction-interval-ms` | The interval in milliseconds to evict 
the fileset cache, -1 means never evict.                                        
                                                                                
                                                                                
                              | 3600000         | No       | 0.9.0-incubating |
-| `fileset-cache-max-size`             | The maximum number of the filesets 
the cache may contain, -1 means no limit.                                       
                                                                                
                                                                                
                                 | 200000          | No       | 
0.9.0-incubating |
-| `config.resources`                   | The configuration resources, 
separated by comma. For example, `hdfs-site.xml,core-site.xml`.                 
                                                                                
                                                                                
                                       | (none)          | No       | 1.1.0     
       |
+| Property Name                        | Description                           
                                                                                
                                                                                
                                                                                
                               | Default Value   | Required | Since Version    |
+|--------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
+| `location`                           | The storage location managed by 
Fileset catalog. Its location name is `unknown`. The value should always a 
directory(HDFS) or path prefix(cloud storage like S3, GCS.) and does not 
support a single file.                                                          
                                                 | (none)          | No       | 
0.5.0            |
+| `location-`                          | The property prefix. User can use 
`location-{name}={path}` to set multiple locations with different names for the 
catalog.                                                                        
                                                                                
                                   | (none)          | No       | 
0.9.0-incubating |
+| `default-filesystem-provider`        | The default filesystem provider of 
this Fileset catalog if users do not specify the scheme in the URI. Candidate 
values are 'builtin-local', 'builtin-hdfs', 's3', 'gcs', 'abs' and 'oss'. 
Default value is `builtin-local`. For S3, if we set this value to 's3', we can 
omit the prefix 's3a://' in the location.  | `builtin-local` | No       | 
0.7.0-incubating |
+| `filesystem-providers`               | The file system providers to add. 
Users need to set this configuration to support cloud storage or custom HCFS. 
For instance, set it to `s3` or a comma separated string that contains `s3` 
like `gs,s3` to support multiple kinds of fileset including `s3`.               
                                         | (none)          | Yes      | 
0.7.0-incubating |
+| `credential-providers`               | The credential provider types, 
separated by comma.                                                             
                                                                                
                                                                                
                                      | (none)          | No       | 
0.8.0-incubating |
+| `filesystem-conn-timeout-secs`       | The timeout of getting the file 
system using Hadoop FileSystem client instance. Time unit: seconds.             
                                                                                
                                                                                
                                     | 6               | No       | 
0.8.0-incubating |
+| `disable-filesystem-ops`             | The configuration to disable file 
system operations in the server side. If set to true, the Fileset catalog in 
the server side will not create, drop files or folder when the schema, fileset 
is created, dropped.                                                            
                                       | false           | No       | 
0.9.0-incubating |
+| `fileset-cache-eviction-interval-ms` | The interval in milliseconds to evict 
the fileset cache, -1 means never evict.                                        
                                                                                
                                                                                
                               | 3600000         | No       | 0.9.0-incubating |
+| `fileset-cache-max-size`             | The maximum number of the filesets 
the cache may contain, -1 means no limit.                                       
                                                                                
                                                                                
                                  | 200000          | No       | 
0.9.0-incubating |
+| `config.resources`                   | The configuration resources, 
separated by comma. For example, `hdfs-site.xml,core-site.xml`.                 
                                                                                
                                                                                
                                        | (none)          | No       | 1.1.0    
        |
+| `fs.path.config.<name>`              | Defines a logical location entry. Set 
`fs.path.config.<name>` to the real base URI (for example, `hdfs://cluster1/`). 
Any key that starts with the same prefix (such as 
`fs.path.config.<name>.config.resource`) is treated as a location-scoped 
property and will be forwarded to the underlying filesystem client. | (none)    
      | No       | 1.1.0            |

Review Comment:
   The `since version` is wrong



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to