Copilot commented on code in PR #9570:
URL: https://github.com/apache/gravitino/pull/9570#discussion_r2650291725
##########
clients/client-python/gravitino/filesystem/gvfs_base_operations.py:
##########
@@ -489,6 +493,30 @@ def _get_actual_filesystem(
fileset_ident, location_name
)
+ def _merge_fileset_properties(
+ self,
+ catalog: FilesetCatalog,
+ schema: Schema,
+ fileset: Fileset,
+ actual_location: str,
+ ) -> Dict[str, str]:
+ """Merge properties from catalog, schema, fileset, options, and
user-defined configs.
+ :param catalog: The fileset catalog
+ :param schema: The schema
+ :param fileset: The fileset
+ :param actual_location: The actual storage location
+ :return: Merged properties dictionary
+ """
+ fileset_props = dict(catalog.properties())
+ fileset_props.update(schema.properties())
+ fileset_props.update(fileset.properties())
+ # Get user-defined configurations for the actual location
+ user_defined_configs = self._get_user_defined_configs(actual_location)
+ if self._options:
+ fileset_props.update(self._options)
+ fileset_props.update(user_defined_configs)
+ return fileset_props
Review Comment:
The property merging order may not be correct. Currently, the order is:
catalog -> schema -> fileset -> options -> user-defined configs. This means
user-defined configs (line 517) will override options (line 516). However, the
comment in the S3StorageHandler (line 291) states "S3 endpoint from client
options has a higher priority", suggesting options should have higher priority
than catalog properties. The current implementation contradicts this
expectation since user-defined configs override everything, even though they
are derived from options with path-specific matching.
##########
catalogs/hadoop-common/src/main/java/org/apache/gravitino/catalog/hadoop/fs/FileSystemUtils.java:
##########
@@ -51,28 +44,14 @@ public class FileSystemUtils {
private FileSystemUtils() {}
- public static Map<String, FileSystemProvider> getFileSystemProviders(String
fileSystemProviders) {
+ public static Map<String, FileSystemProvider> getFileSystemProviders() {
Map<String, FileSystemProvider> resultMap = Maps.newHashMap();
ServiceLoader<FileSystemProvider> allFileSystemProviders =
ServiceLoader.load(FileSystemProvider.class);
- Set<String> providersInUses =
- fileSystemProviders != null
- ? Arrays.stream(fileSystemProviders.split(","))
- .map(f -> f.trim().toLowerCase(Locale.ROOT))
- .collect(Collectors.toSet())
- : Sets.newHashSet();
-
- // Add built-in file system providers to the use list automatically.
- providersInUses.add(BUILTIN_LOCAL_FS_PROVIDER.toLowerCase(Locale.ROOT));
- providersInUses.add(BUILTIN_HDFS_FS_PROVIDER.toLowerCase(Locale.ROOT));
-
// Only get the file system providers that are in the user list and check
if the scheme is
Review Comment:
The comment on line 52-53 is now inaccurate. It states "Only get the file
system providers that are in the user list and check if the scheme is unique",
but there is no longer any user list filtering since the parameter was removed.
The comment should be updated to reflect that all available FileSystemProviders
from the ServiceLoader are being loaded, and only uniqueness of schemes is
being checked.
```suggestion
// Load all available file system providers from the ServiceLoader and
ensure each scheme is
```
##########
clients/client-python/tests/integration/integration_test_env.py:
##########
@@ -180,7 +182,10 @@ def restart_server(cls):
if result.stderr:
logger.info("stderr: %s", result.stderr)
- if not check_gravitino_server_status():
+ succes = check_gravitino_server_status()
+ if succes:
Review Comment:
Typo: 'succes' should be 'success'.
```suggestion
success = check_gravitino_server_status()
if success:
```
##########
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogOperations.java:
##########
@@ -245,14 +245,9 @@ public void initialize(
}
if (!disableFSOps) {
- String fileSystemProviders =
- (String)
- propertiesMetadata
- .catalogPropertiesMetadata()
- .getOrDefault(config,
FilesetCatalogPropertiesMetadata.FILESYSTEM_PROVIDERS);
this.fileSystemProvidersMap =
ImmutableMap.<String, FileSystemProvider>builder()
-
.putAll(FileSystemUtils.getFileSystemProviders(fileSystemProviders))
+ .putAll(FileSystemUtils.getFileSystemProviders())
.build();
Review Comment:
The `FILESYSTEM_PROVIDERS` catalog property is no longer being used in the
code, but it's still defined in `FilesetCatalogPropertiesMetadata` and is being
used in integration tests. This is a breaking API change that could silently
fail for users who configure this property expecting it to filter filesystem
providers. Either the property should be removed entirely with proper
deprecation, or the filtering logic should be restored.
##########
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/GravitinoVirtualFileSystemConfiguration.java:
##########
@@ -184,7 +184,10 @@ public class GravitinoVirtualFileSystemConfiguration {
/** The default value for whether to enable auto-creation of fileset
location. */
public static final boolean FS_GRAVITINO_AUTO_CREATE_LOCATION_DEFAULT = true;
- /** The prefix for user-defined location configs: {@code
fs.path.config.<locationName>=<path>}. */
+ /**
+ * The prefix for user-defined location configs: {@code
+ * fs.path.config.<locationName><property_name>=<property_value>}.
Review Comment:
The comment format is inconsistent with the actual configuration pattern.
The comment suggests the pattern is
"fs.path.config.<locationName><property_name>" (missing a dot separator), but
based on the Python implementation and tests, the actual pattern should be
"fs.path.config.<locationName>.<property_name>" (with a dot separator). The
comment should include the dot between locationName and property_name for
clarity.
```suggestion
* fs.path.config.<locationName>.<property_name>=<property_value>}.
```
##########
clients/client-python/gravitino/filesystem/gvfs_base_operations.py:
##########
@@ -754,3 +785,116 @@ def _get_fileset(self, fileset_ident: NameIdentifier):
return fileset
finally:
write_lock.release()
+
+ def _get_fileset_schema(self, schema_ident: NameIdentifier):
+ """Get the schema by the schema identifier from the cache or load it
from the server if the cache is disabled.
+ :param schema_ident: The schema identifier
+ :return: The schema
+ """
+ if not self._enable_fileset_metadata_cache:
+ catalog_ident: NameIdentifier = NameIdentifier.of(
+ schema_ident.namespace().level(0),
schema_ident.namespace().level(1)
+ )
+ catalog: FilesetCatalog = self._get_fileset_catalog(catalog_ident)
+ return catalog.as_schemas().load_schema(schema_ident.name())
+
+ read_lock = self._schema_cache_lock.gen_rlock()
+ try:
+ read_lock.acquire()
+ cache_value: Schema = self._schema_cache.get(schema_ident)
+ if cache_value is not None:
+ return cache_value
+ finally:
+ read_lock.release()
+
+ write_lock = self._schema_cache_lock.gen_wlock()
+ try:
+ write_lock.acquire()
+ cache_value: Schema = self._schema_cache.get(schema_ident)
+ if cache_value is not None:
+ return cache_value
+
+ catalog_ident: NameIdentifier = NameIdentifier.of(
+ schema_ident.namespace().level(0),
schema_ident.namespace().level(1)
+ )
+ catalog: FilesetCatalog = self._get_fileset_catalog(catalog_ident)
+ schema = catalog.as_schemas().load_schema(schema_ident.name())
+ self._schema_cache[schema_ident] = schema
+ return schema
+ finally:
+ write_lock.release()
Review Comment:
The new methods `_merge_fileset_properties` and `_get_fileset_schema` are
not covered by unit tests. While integration tests may exercise these code
paths, unit tests would provide better coverage and faster feedback for these
critical methods that determine property precedence and caching behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]