yuqi1129 commented on code in PR #8777:
URL: https://github.com/apache/gravitino/pull/8777#discussion_r2517971019


##########
docs/fileset-catalog.md:
##########
@@ -45,14 +159,25 @@ Please refer to [Credential 
vending](./security/credential-vending.md) for more
 
 Apart from the above properties, to access fileset like HDFS fileset, you need 
to configure the following extra properties.
 
-| Property Name                                      | Description             
                                                                       | 
Default Value | Required                                                    | 
Since Version |
-|----------------------------------------------------|------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
-| `authentication.impersonation-enable`              | Whether to enable 
impersonation for the Fileset catalog.                                        | 
`false`       | No                                                          | 
0.5.1         |
-| `authentication.type`                              | The type of 
authentication for Fileset catalog, currently we only support `kerberos`, 
`simple`. | `simple`      | No                                                  
        | 0.5.1         |
-| `authentication.kerberos.principal`                | The principal of the 
Kerberos authentication                                                   | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
-| `authentication.kerberos.keytab-uri`               | The URI of The keytab 
for the Kerberos authentication.                                         | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
-| `authentication.kerberos.check-interval-sec`       | The check interval of 
Kerberos credential for Fileset catalog.                                  | 60  
          | No                                                          | 0.5.1 
        |
-| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of 
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`.     | 60  
          | No                                                          | 0.5.1 
        |
+| Property Name                                      | Description             
                                                                          | 
Default Value | Required                                                    | 
Since Version |
+|----------------------------------------------------|---------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
+| `authentication.impersonation-enable`              | Whether to enable 
impersonation for the Fileset catalog.                                          
| `false`       | No                                                          | 
0.5.1         |
+| `authentication.type`                              | The type of 
authentication for Fileset catalog, currently we only support `kerberos`, 
`simple`.   | `simple`      | No                                                
          | 0.5.1         |
+| `authentication.kerberos.principal`                | The principal of the 
Kerberos authentication                                                      | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
+| `authentication.kerberos.keytab-uri`               | The URI of The keytab 
for the Kerberos authentication.                                            | 
(none)        | required if the value of `authentication.type` is Kerberos. | 
0.5.1         |
+| `authentication.kerberos.check-interval-sec`       | The check interval of 
Kerberos credential for Fileset catalog.                                    | 
60            | No                                                          | 
0.5.1         |
+| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of 
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`.        | 
60            | No                                                          | 
0.5.1         |
+| `hdfs.config.resources`                            | The HDFS configuration 
resources, separated by comma. For example, `hdfs-site.xml,core-site.xml`. | 
(none)        | No                                                          | 
0.5.1         |
+
+The `hdfs.config.resources` property allows users to specify custom HDFS 
configuration files.
+
+The Gravitino Fileset extends the following properties in the `xxx-site.xml`:
+
+| Property Name                                     | Description              
                                               | Default Value | Required       
                                             | Since Version |
+|---------------------------------------------------|-------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
+| hadoop.security.authentication.kerberos.principal | The principal of the 
Kerberos authentication for HDFS client.           | (none)        | required 
if the value of `authentication.type` is Kerberos. | 0.1.1         |

Review Comment:
   0.1.1?



##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/HDFSFileSystemProxy.java:
##########
@@ -0,0 +1,172 @@
+package org.apache.gravitino.filesystem.hadoop;
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.apache.gravitino.catalog.hadoop.fs.Constants.AUTH_KERBEROS;
+import static org.apache.gravitino.catalog.hadoop.fs.Constants.AUTH_SIMPlE;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.FS_DISABLE_CACHE;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_KRB5_CONF;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_SECURITY_KEYTAB;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.HADOOP_SECURITY_PRINCIPAL;
+import static 
org.apache.gravitino.catalog.hadoop.fs.Constants.SECURITY_KRB5_ENV;
+import static 
org.apache.gravitino.catalog.hadoop.fs.HDFSFileSystemProvider.IPC_FALLBACK_TO_SIMPLE_AUTH_ALLOWED;
+import static 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.security.PrivilegedExceptionAction;
+import java.util.Timer;
+import java.util.TimerTask;
+import net.sf.cglib.proxy.Enhancer;
+import net.sf.cglib.proxy.MethodInterceptor;
+import net.sf.cglib.proxy.MethodProxy;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.gravitino.exceptions.GravitinoRuntimeException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A FileSystem wrapper that runs all operations under a specific UGI 
(UserGroupInformation).
+ * Supports both simple and Kerberos authentication, with automatic ticket 
renewal.
+ */
+public class HDFSFileSystemProxy implements MethodInterceptor {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HDFSFileSystemProxy.class);
+
+  private static final long DEFAULT_RENEW_INTERVAL_MS = 10 * 60 * 1000L;
+  private static final String SYSTEM_USER_NAME = 
System.getProperty("user.name");
+  private static final String SYSTEM_ENV_HADOOP_USER_NAME = "HADOOP_USER_NAME";
+
+  private final UserGroupInformation ugi;
+  private final FileSystem fs;
+  private final Configuration configuration;
+  private Timer kerberosRenewTimer;
+
+  /**
+   * Create a HDFSAuthenticationFileSystem with the given path and 
configuration. Supports both
+   * simple and Kerberos authentication, with automatic ticket renewal for 
Kerberos.
+   *
+   * @param path the HDFS path
+   * @param conf the Hadoop configuration
+   */
+  public HDFSFileSystemProxy(Path path, Configuration conf) {
+    try {
+      conf.setBoolean(FS_DISABLE_CACHE, true);
+      conf.setBoolean(IPC_FALLBACK_TO_SIMPLE_AUTH_ALLOWED, true);
+      this.configuration = conf;
+
+      String authType = conf.get(HADOOP_SECURITY_AUTHENTICATION, AUTH_SIMPlE);
+      if (AUTH_KERBEROS.equalsIgnoreCase(authType)) {
+        String krb5Config = conf.get(HADOOP_KRB5_CONF);
+
+        if (krb5Config != null) {
+          System.setProperty(SECURITY_KRB5_ENV, krb5Config);
+        }
+        UserGroupInformation.setConfiguration(conf);
+        String principal = conf.get(HADOOP_SECURITY_PRINCIPAL, null);
+        String keytab = conf.get(HADOOP_SECURITY_KEYTAB, null);
+
+        if (principal == null || keytab == null) {
+          throw new GravitinoRuntimeException(
+              "Kerberos principal and keytab must be provided for kerberos 
authentication");
+        }
+
+        this.ugi = 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
+        startKerberosRenewalTask(principal);
+      } else {
+        String userName = System.getenv(SYSTEM_ENV_HADOOP_USER_NAME);
+        if (StringUtils.isEmpty(userName)) {
+          userName = SYSTEM_USER_NAME;
+        }
+        this.ugi = UserGroupInformation.createRemoteUser(userName);
+      }
+
+      this.fs =
+          ugi.doAs(
+              (PrivilegedExceptionAction<FileSystem>)
+                  () -> FileSystem.newInstance(path.toUri(), conf));
+
+    } catch (Exception e) {
+      throw new GravitinoRuntimeException(e, "Failed to create HDFS FileSystem 
with UGI: %s", path);
+    }
+  }
+
+  /**
+   * Get the proxied FileSystem instance.
+   *
+   * @return the proxied FileSystem
+   * @throws IOException if an I/O error occurs
+   */
+  public FileSystem getProxy() throws IOException {
+    Enhancer e = new Enhancer();
+    e.setClassLoader(fs.getClass().getClassLoader());
+    e.setSuperclass(fs.getClass());
+    e.setCallback(this);
+    FileSystem proxyFs = (FileSystem) e.create();
+    fs.setConf(configuration);
+    return proxyFs;
+  }
+
+  /** Schedule periodic Kerberos re-login to refresh TGT before expiry. */
+  private void startKerberosRenewalTask(String principal) {
+    kerberosRenewTimer = new Timer(true);

Review Comment:
   It's advised not to use `Timer`, and please try to replace it with 
`ScheduledThreadPoolExecutor`.



##########
clients/filesystem-hadoop3/src/test/java/org/apache/gravitino/filesystem/hadoop/integration/test/GvfsMultipleClusterIT.java:
##########
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.gravitino.filesystem.hadoop.integration.test;
+
+import static org.apache.gravitino.file.Fileset.LOCATION_NAME_UNKNOWN;
+import static org.apache.gravitino.file.Fileset.PROPERTY_DEFAULT_LOCATION_NAME;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Maps;
+import java.io.File;
+import java.io.IOException;
+import java.lang.reflect.Method;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.commons.io.FileUtils;
+import org.apache.gravitino.Catalog;
+import org.apache.gravitino.NameIdentifier;
+import org.apache.gravitino.client.GravitinoMetalake;
+import org.apache.gravitino.file.Fileset;
+import org.apache.gravitino.integration.test.container.ContainerSuite;
+import org.apache.gravitino.integration.test.container.HiveContainer;
+import org.apache.gravitino.integration.test.util.BaseIT;
+import org.apache.gravitino.integration.test.util.GravitinoITUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.authentication.util.KerberosName;
+import org.apache.hadoop.security.authentication.util.KerberosUtil;
+import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Tag;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.condition.DisabledIf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@Tag("gravitino-docker-test")
+@DisabledIf("org.apache.gravitino.integration.test.util.ITUtils#isEmbedded")

Review Comment:
   Why do we disable it if Gravitino runs in the embedded mode?



##########
docs/fileset-catalog.md:
##########
@@ -151,6 +277,7 @@ This behavior is skipped in either of these cases:
 | `credential-providers`                | The credential provider types, 
separated by comma.                                                             
      | (none)                                                                  
                                       | No                                     
    | No        | 0.8.0-incubating |
 | `placeholder-`                        | Properties that start with 
`placeholder-` are used to replace placeholders in the location.                
          | (none)                                                              
                                           | No                                 
        | Yes       | 0.9.0-incubating |
 | `default-location-name`               | The name of the default location of 
the fileset, mainly used for GVFS operations without specifying a location 
name. | When the fileset has only one location, its location name will be 
automatically selected as the default value. | Yes, if the fileset has multiple 
locations | Yes       | 0.9.0-incubating |
+| `hdfs.config.resources`               | The HDFS configuration resources, 
separated by comma. For example, `hdfs-site.xml,core-site.xml`.                 
                                                                                
                 | (none)          | No       | 0.5.1            |

Review Comment:
   Why is the value of `Since version` `0.5.1`? Is there anything that I missed?



##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/FilesetMetadataCache.java:
##########
@@ -49,6 +52,7 @@ public FilesetMetadataCache(GravitinoClient client) {
     this.client = client;
     this.catalogCache = newCatalogCache();
     this.filesetCache = newFilesetCache();
+    this.schemaCache = newSchemaCache();

Review Comment:
   You need to close the `schemaCache` in the `close` method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to