mchades commented on code in PR #8777:
URL: https://github.com/apache/gravitino/pull/8777#discussion_r2575658679


##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/BaseGVFSOperations.java:
##########
@@ -867,20 +955,19 @@ private Cache<FileSystemCacheKey, FileSystem> 
newFileSystemCache(Configuration c
   }
 
   private Map<String, String> getAllProperties(
-      NameIdentifier filesetIdent, String scheme, String locationName) {
+      NameIdentifier filesetIdent, Map<String, String> filesetProperties) {
+    Map<String, String> allProperties = new HashMap<>();
     Catalog catalog =
         (Catalog)
             getFilesetCatalog(
                 NameIdentifier.of(
                     filesetIdent.namespace().level(0), 
filesetIdent.namespace().level(1)));
+    allProperties.putAll(catalog.properties());
 
-    Map<String, String> allProperties = 
getNecessaryProperties(catalog.properties());
-    allProperties.putAll(getConfigMap(conf));
-    if (enableCredentialVending()) {
-      allProperties.putAll(
-          getCredentialProperties(
-              getFileSystemProviderByScheme(scheme), filesetIdent, 
locationName));
-    }
+    Schema schema = 
getSchema(NameIdentifier.parse(filesetIdent.namespace().toString()));
+    allProperties.putAll(schema.properties());
+    allProperties.putAll(filesetProperties);

Review Comment:
   Python GVFS also needs to align this change?  If you don't plan to address 
this in this PR, please open a new issue to track it.



##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/GravitinoVirtualFileSystemUtils.java:
##########
@@ -52,16 +52,25 @@ public class GravitinoVirtualFileSystemUtils {
       
Pattern.compile("^(?:gvfs://fileset)?/([^/]+)/([^/]+)/([^/]+)(?>/[^/]+)*/?$");
 
   /**
-   * Transform the Hadoop configuration to a map.
+   * Extract non-default configuration from Hadoop Configuration.
    *
    * @param configuration The Hadoop configuration.
    * @return The configuration map.
    */
-  public static Map<String, String> getConfigMap(Configuration configuration) {
+  public static Map<String, String> extractNonDefaultConfig(Configuration 
configuration) {

Review Comment:
   This is a public static method in the client, to maintain backward 
compatibility, you should add a new method instead of modifying this method



##########
docs/how-to-use-gvfs.md:
##########
@@ -71,11 +71,44 @@ the path mapping and convert automatically.
 | `fs.gravitino.client.request.header.`                 | The configuration 
key prefix for the Gravitino client request header. You can set the request 
header for the Gravitino client.                                                
                                                                                
                                                       | (none)                 
                                        | No                                  | 
0.9.0-incubating |
 | `fs.gravitino.enableCredentialVending`                | Whether to enable 
credential vending for the Gravitino Virtual File System.                       
                                                                                
                                                                                
                                                   | `false`                    
                                    | No                                  | 
0.9.0-incubating |
 | `fs.gravitino.client.`                                | The configuration 
key prefix for the Gravitino client config.                                     
                                                                                
                                                                                
                                                   | (none)                     
                                    | No                                  | 
1.0.0            |
-| `fs.gravitino.filesetMetadataCache.enable`            | Whether to cache the 
fileset or fileset catalog metadata in the Gravitino Virtual File System. Note 
that this cache causes a side effect: if you modify the fileset or fileset 
catalog metadata, the client can not see the latest changes.                    
                                                      | `false`                 
                                       | No                                  | 
1.0.0            |
-| `fs.gravitino.autoCreateLocation`                     | The configuration 
key for whether to enable auto-creation of fileset location when the 
server-side filesystem ops are disabled and the location does not exist.        
                                                                                
                                                              | `true`          
                                               | No                             
     | 1.1.0           |
+| `fs.gravitino.filesetMetadataCache.enable`            | Whether to cache the 
fileset, fileset schema or fileset catalog metadata in the Gravitino Virtual 
File System. Note that this cache causes a side effect: if you modify the 
fileset or fileset catalog metadata, the client can not see the latest changes. 
                                                         | `false`              
                                          | No                                  
| 1.0.0            |
+| `fs.gravitino.autoCreateLocation`                     | The configuration 
key for whether to enable auto-creation of fileset location when the 
server-side filesystem ops are disabled and the location does not exist.        
                                                                                
                                                              | `true`          
                                               | No                             
     | 1.1.0            |
+| `fs.path.config.<name>`                               | Defines a logical 
location entry. Set `fs.path.config.<name>` to the real base URI (for example, 
`hdfs://cluster1/`). Any key that starts with the same prefix (such as 
`fs.path.config.<name>.config.resource`) is treated as a location-scoped 
property and will be forwarded to the underlying filesystem client. | (none)    
                                                     | No                       
           | 1.1.0            |

Review Comment:
   miss introduce `fs.path.config.<name>.config.resource`



##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/HDFSFileSystemProxy.java:
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.filesystem.hadoop;
+
+import static org.apache.gravitino.catalog.hadoop.fs.Constants.AUTH_KERBEROS;
+import static org.apache.gravitino.catalog.hadoop.fs.Constants.AUTH_SIMPLE;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.FS_DISABLE_CACHE;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_KRB5_CONF;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_SECURITY_KEYTAB;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_SECURITY_PRINCIPAL;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.SECURITY_KRB5_ENV;
+import static 
org.apache.gravitino.catalog.hadoop.fs.HDFSFileSystemProvider.IPC_FALLBACK_TO_SIMPLE_AUTH_ALLOWED;
+import static 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.security.PrivilegedExceptionAction;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import net.sf.cglib.proxy.Enhancer;
+import net.sf.cglib.proxy.MethodInterceptor;
+import net.sf.cglib.proxy.MethodProxy;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.gravitino.exceptions.GravitinoRuntimeException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A FileSystem wrapper that runs all operations under a specific UGI 
(UserGroupInformation).
+ * Supports both simple and Kerberos authentication, with automatic ticket 
renewal.
+ */
+public class HDFSFileSystemProxy implements MethodInterceptor {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HDFSFileSystemProxy.class);
+
+  private static final long DEFAULT_RENEW_INTERVAL_MS = 10 * 60 * 1000L;
+  private static final String SYSTEM_USER_NAME = 
System.getProperty("user.name");
+  private static final String SYSTEM_ENV_HADOOP_USER_NAME = "HADOOP_USER_NAME";
+
+  private final UserGroupInformation ugi;
+  private final FileSystem fs;
+  private final Configuration configuration;
+  private ScheduledExecutorService kerberosRenewExecutor;
+
+  /**
+   * Create a HDFSAuthenticationFileSystem with the given path and 
configuration. Supports both
+   * simple and Kerberos authentication, with automatic ticket renewal for 
Kerberos.
+   *
+   * @param path the HDFS path
+   * @param conf the Hadoop configuration
+   */
+  public HDFSFileSystemProxy(Path path, Configuration conf) {
+    try {
+      conf.setBoolean(FS_DISABLE_CACHE, true);
+      conf.setBoolean(IPC_FALLBACK_TO_SIMPLE_AUTH_ALLOWED, true);
+      this.configuration = conf;
+
+      String authType = conf.get(HADOOP_SECURITY_AUTHENTICATION, AUTH_SIMPLE);
+      if (AUTH_KERBEROS.equalsIgnoreCase(authType)) {
+        String krb5Config = conf.get(HADOOP_KRB5_CONF);
+
+        if (krb5Config != null) {
+          System.setProperty(SECURITY_KRB5_ENV, krb5Config);
+        }
+        UserGroupInformation.setConfiguration(conf);
+        String principal = conf.get(HADOOP_SECURITY_PRINCIPAL, null);
+        String keytab = conf.get(HADOOP_SECURITY_KEYTAB, null);
+
+        if (principal == null || keytab == null) {
+          throw new GravitinoRuntimeException(
+              "Kerberos principal and keytab must be provided for kerberos 
authentication");
+        }
+
+        this.ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
+        startKerberosRenewalTask(principal);
+      } else {
+        String userName = System.getenv(SYSTEM_ENV_HADOOP_USER_NAME);
+        if (StringUtils.isEmpty(userName)) {
+          userName = SYSTEM_USER_NAME;
+        }
+        this.ugi = UserGroupInformation.createRemoteUser(userName);
+      }
+
+      this.fs =
+          ugi.doAs(
+              (PrivilegedExceptionAction<FileSystem>)
+                  () -> FileSystem.newInstance(path.toUri(), conf));
+
+    } catch (Exception e) {
+      throw new GravitinoRuntimeException(e, "Failed to create HDFS FileSystem 
with UGI: %s", path);
+    }
+  }
+
+  /**
+   * Get the proxied FileSystem instance.
+   *
+   * @return the proxied FileSystem
+   * @throws IOException if an I/O error occurs
+   */
+  public FileSystem getProxy() throws IOException {
+    Enhancer e = new Enhancer();
+    e.setClassLoader(fs.getClass().getClassLoader());
+    e.setSuperclass(fs.getClass());
+    e.setCallback(this);
+    FileSystem proxyFs = (FileSystem) e.create();
+    fs.setConf(configuration);
+    return proxyFs;
+  }
+
+  /** Schedule periodic Kerberos re-login to refresh TGT before expiry. */
+  private void startKerberosRenewalTask(String principal) {
+    kerberosRenewExecutor =
+        Executors.newSingleThreadScheduledExecutor(
+            r -> {
+              Thread t = new Thread(r, "HDFSFileSystemProxy 
Kerberos-Renewal-Thread");
+              t.setDaemon(true);
+              return t;
+            });
+
+    kerberosRenewExecutor.scheduleAtFixedRate(
+        () -> {
+          try {
+            if (ugi.hasKerberosCredentials()) {
+              ugi.checkTGTAndReloginFromKeytab();
+            }
+          } catch (Exception e) {
+            LOG.error(
+                "[Kerberos] Failed to renew TGT for principal {}: {}",
+                principal,
+                e.getMessage(),
+                e);
+          }
+        },
+        DEFAULT_RENEW_INTERVAL_MS,
+        DEFAULT_RENEW_INTERVAL_MS,
+        TimeUnit.MILLISECONDS);
+  }
+
+  /** Close the Kerberos renewal executor service to prevent resource leaks. */
+  private void close() {
+    if (kerberosRenewExecutor != null) {
+      kerberosRenewExecutor.shutdownNow();
+      kerberosRenewExecutor = null;
+    }
+  }
+
+  /** Invoke the method on the underlying FileSystem using ugi.doAs. */
+  private Object invokeWithUgi(MethodProxy methodProxy, Object[] objects) 
throws Throwable {
+    return ugi.doAs(
+        (PrivilegedExceptionAction<Object>)
+            () -> {
+              try {
+                return methodProxy.invoke(fs, objects);
+              } catch (IOException e) {
+                throw e;
+              } catch (Throwable e) {
+                if (RuntimeException.class.isAssignableFrom(e.getClass())) {
+                  throw (RuntimeException) e;
+                }
+                throw new RuntimeException("Failed to invoke method", e);
+              }
+            });
+  }
+
+  @Override
+  public Object intercept(Object o, Method method, Object[] objects, 
MethodProxy methodProxy)

Review Comment:
   Please place the public method before the private method.



##########
docs/fileset-catalog.md:
##########
@@ -38,31 +39,50 @@ Besides the [common catalog 
properties](./gravitino-server-config.md#apache-grav
 | `disable-filesystem-ops`             | The configuration to disable file 
system operations in the server side. If set to true, the Fileset catalog in 
the server side will not create, drop files or folder when the schema, fileset 
is created, dropped.                                                            
                                      | false           | No       | 
0.9.0-incubating |
 | `fileset-cache-eviction-interval-ms` | The interval in milliseconds to evict 
the fileset cache, -1 means never evict.                                        
                                                                                
                                                                                
                              | 3600000         | No       | 0.9.0-incubating |
 | `fileset-cache-max-size`             | The maximum number of the filesets 
the cache may contain, -1 means no limit.                                       
                                                                                
                                                                                
                                 | 200000          | No       | 
0.9.0-incubating |
+| `config.resources`                   | The configuration resources, 
separated by comma. For example, `hdfs-site.xml,core-site.xml`.                 
                                                                                
                                                                                
                                       | (none)          | No       | 1.1.0     
       |
 
 Please refer to [Credential vending](./security/credential-vending.md) for 
more details about credential vending.
 
-### HDFS fileset 
+### HDFS fileset
 
-Apart from the above properties, to access fileset like HDFS fileset, you need 
to configure the following extra properties.
+Apart from the above properties, to access fileset like HDFS fileset, you need 
to configure the following extra
+properties.
 
-| Property Name                                      | Description             
                                                                       | 
Default Value | Required                                                    | 
Since Version |
-|----------------------------------------------------|------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
+| Property Name                                      | Description             
                                                                        | 
Default Value | Required                                                    | 
Since Version |
+|----------------------------------------------------|-------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
 | `authentication.impersonation-enable`              | Whether to enable 
impersonation for the Fileset catalog.                                        | 
`false`       | No                                                          | 
0.5.1         |
 | `authentication.type`                              | The type of 
authentication for Fileset catalog, currently we only support `kerberos`, 
`simple`. | `simple`      | No                                                  
        | 0.5.1         |
-| `authentication.kerberos.principal`                | The principal of the 
Kerberos authentication                                                   | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
-| `authentication.kerberos.keytab-uri`               | The URI of The keytab 
for the Kerberos authentication.                                         | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
+| `authentication.kerberos.principal`                | The principal of the 
Kerberos authentication                                                    | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
+| `authentication.kerberos.keytab-uri`               | The URI of The keytab 
for the Kerberos authentication.                                          | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
 | `authentication.kerberos.check-interval-sec`       | The check interval of 
Kerberos credential for Fileset catalog.                                  | 60  
          | No                                                          | 0.5.1 
        |
-| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of 
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`.     | 60  
          | No                                                          | 0.5.1 
        |
+| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of 
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`.      | 60 
           | No                                                          | 
0.5.1         |
+
+The `config.resources` property allows users to specify custom configuration 
files.
+
+The Gravitino Fileset extends the following properties in the `xxx-site.xml`:
+
+| Property Name                                     | Description              
                                               | Default Value | Required       
                                             | Since Version |
+|---------------------------------------------------|-------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
+| hadoop.security.authentication.kerberos.principal | The principal of the 
Kerberos authentication for HDFS client.           | (none)        | required 
if the value of `authentication.type` is Kerberos. | 1.1.0         |
+| hadoop.security.authentication.kerberos.keytab    | The keytab file path of 
the Kerberos authentication for HDFS client.    | (none)        | required if 
the value of `authentication.type` is Kerberos. | 1.1.0         |
+| hadoop.security.authentication.kerberos.krb5.conf | The krb5.conf file path 
of the Kerberos authentication for HDFS client. | (none)        | No            
                                              | 1.1.0         |
 
 ### Fileset catalog with Cloud Storage
+
+In the current implementation, the fileset uses the HDFS protocol to access 
its location. If users use S3, GCS, OSS,
+or Azure Blob Storage, they can also configure the `config.resources property` 
to specify custom configuration

Review Comment:
   `config.resources` property



##########
clients/filesystem-hadoop3/src/test/java/org/apache/gravitino/filesystem/hadoop/integration/test/GvfsMultipleClusterIT.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.filesystem.hadoop.integration.test;
+
+import static org.apache.gravitino.file.Fileset.LOCATION_NAME_UNKNOWN;
+import static org.apache.gravitino.file.Fileset.PROPERTY_DEFAULT_LOCATION_NAME;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.io.File;
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.commons.io.FileUtils;
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.client.GravitinoMetalake;
+import org.apache.gravitino.file.Fileset;
+import org.apache.gravitino.integration.test.container.ContainerSuite;
+import org.apache.gravitino.integration.test.container.HiveContainer;
+import org.apache.gravitino.integration.test.util.BaseIT;
+import org.apache.gravitino.integration.test.util.GravitinoITUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.authentication.util.KerberosName;
+import org.apache.hadoop.security.authentication.util.KerberosUtil;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.condition.DisabledIf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@Tag("gravitino-docker-test")
+@DisabledIf("org.apache.gravitino.integration.test.util.ITUtils#isEmbedded")

Review Comment:
   It's better to add the comment to the code, not just comment here



##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/BaseGVFSOperations.java:
##########
@@ -762,6 +795,52 @@ protected FileSystem getActualFileSystemByLocationName(
     }
   }
 
+  /**
+   * Get user defined configurations for a specific location. Configuration 
format:
+   *
+   * @param baseLocation the base location of fileset
+   * @return a map of configuration properties for the specified location
+   */
+  private Map<String, String> getUserDefinedConfigs(String baseLocation) {

Review Comment:
   Please move the private method behind the protected method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to