mchades commented on code in PR #7215:
URL: https://github.com/apache/gravitino/pull/7215#discussion_r2115191398
##########
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##########
@@ -237,6 +240,49 @@ public Fileset loadFileset(NameIdentifier ident) throws
NoSuchFilesetException {
});
}
+ @Override
+ public FileInfo[] listFiles(NameIdentifier filesetIdent, String
locationName, String subPath)
+ throws NoSuchFilesetException, IOException {
+ if (disableFSOps) {
+ LOG.warn("Filesystem operations disabled, rejecting listFiles for {}",
filesetIdent);
+ throw new UnsupportedOperationException("Filesystem operations are
disabled on this server");
+ }
+
+ String actualPath = getFileLocation(filesetIdent, subPath, locationName);
+ Path formalizedPath = formalizePath(new Path(actualPath), conf);
+
+ FileSystem fs = getFileSystem(formalizedPath, conf);
+ if (!fs.exists(formalizedPath)) {
+ return new FileInfo[0];
Review Comment:
After thinking about it, I believe we should throw an
`IllegalArgumentException` here instead of returning an empty array. For
example, if we try to `ls not/exist/path`, it will return an error message
rather than an empty result.
##########
core/src/main/java/org/apache/gravitino/catalog/FilesetNormalizeDispatcher.java:
##########
@@ -52,6 +54,12 @@ public NameIdentifier[] listFilesets(Namespace namespace)
throws NoSuchSchemaExc
return normalizeCaseSensitive(identifiers);
}
+ @Override
+ public FileInfo[] listFiles(NameIdentifier ident, String locationName,
String subPath)
+ throws NoSuchFilesetException, IOException {
+ return dispatcher.listFiles(ident, locationName, subPath);
Review Comment:
plz use `normalizeCaseSensitive(ident)`
##########
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##########
@@ -1248,6 +1294,30 @@ private boolean checkSingleFile(Fileset fileset, String
locationName) {
}
}
+ private String buildGVFSFilePath(
Review Comment:
how about simplifying the method codes to :
```java
private String buildGVFSFilePath(
String catalogName, String schemaName, String filesetName, String
subPath) {
String prefix = String.join(SLASH, "/fileset", catalogName, schemaName,
filesetName);
return StringUtils.isBlank(subPath) ? prefix : prefix +
ensureLeadingSlash(removeTrailingSlash(subPath));
}
```
##########
catalogs/catalog-hadoop/src/test/java/org/apache/gravitino/catalog/hadoop/integration/test/HadoopCatalogIT.java:
##########
@@ -151,6 +169,38 @@ private void dropSchema() {
Assertions.assertFalse(catalog.asSchemas().schemaExists(schemaName));
}
+ private String getFileInfos(NameIdentifier filesetIdent, String subPath,
String locationName)
+ throws IOException {
+ String targetUrl =
+ "api/metalakes/"
+ + metalakeName
+ + "/catalogs/"
+ + catalogName
+ + "/schemas/"
+ + schemaName
+ + "/filesets/"
+ + filesetIdent.name()
+ + "/files";
+ Map<String, String> query = new HashMap<>();
+ if (subPath != null && !subPath.isEmpty()) {
Review Comment:
can use `StringUtils.isBlank`
##########
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##########
@@ -237,6 +240,49 @@ public Fileset loadFileset(NameIdentifier ident) throws
NoSuchFilesetException {
});
}
+ @Override
+ public FileInfo[] listFiles(NameIdentifier filesetIdent, String
locationName, String subPath)
+ throws NoSuchFilesetException, IOException {
+ if (disableFSOps) {
+ LOG.warn("Filesystem operations disabled, rejecting listFiles for {}",
filesetIdent);
+ throw new UnsupportedOperationException("Filesystem operations are
disabled on this server");
+ }
+
+ String actualPath = getFileLocation(filesetIdent, subPath, locationName);
+ Path formalizedPath = formalizePath(new Path(actualPath), conf);
+
+ FileSystem fs = getFileSystem(formalizedPath, conf);
+ if (!fs.exists(formalizedPath)) {
+ return new FileInfo[0];
+ }
+
+ String catalogName = filesetIdent.namespace().level(1);
+ String schemaName = filesetIdent.namespace().level(2);
+ String filesetName = filesetIdent.name();
+
+ try {
+ FileStatus[] fileStatuses = fs.listStatus(formalizedPath);
+ FileInfo[] fileInfos = new FileInfo[fileStatuses.length];
+ for (int i = 0; i < fileStatuses.length; i++) {
Review Comment:
It's better to use stream API
##########
core/src/main/java/org/apache/gravitino/listener/FilesetEventDispatcher.java:
##########
@@ -81,6 +83,12 @@ public NameIdentifier[] listFilesets(Namespace namespace)
throws NoSuchSchemaExc
}
}
+ @Override
+ public FileInfo[] listFiles(NameIdentifier ident, String locationName,
String subPath)
+ throws NoSuchFilesetException, IOException {
+ return dispatcher.listFiles(ident, locationName, subPath);
Review Comment:
Could you please add a todo comment here to implement the `ListFilesEvent`
in another PR later?
##########
catalogs/catalog-hadoop/src/test/java/org/apache/gravitino/catalog/hadoop/integration/test/HadoopCatalogIT.java:
##########
@@ -656,6 +706,48 @@ public void testListFilesets() throws IOException {
Assertions.assertEquals(fileset2.name(), nameIdentifiers1[1].name());
}
+ @Test
+ public void testListFilesetFiles() throws IOException {
+ // clear schema
+ dropSchema();
+ createSchema();
+
+ String filesetName = "test_list_fileset_files";
+ String storageLocation = storageLocation(filesetName);
+ createFileset(filesetName, "comment", MANAGED, storageLocation,
ImmutableMap.of("k1", "v1"));
+ assertFilesetExists(filesetName);
+
+ Path basePath = new Path(storageLocation);
+ fileSystem.mkdirs(basePath);
+
+ String fileName = "test1.txt";
+ Path file1 = new Path(basePath, fileName);
+ try (FSDataOutputStream out = fileSystem.create(file1, true)) {
+ out.write("hello".getBytes(StandardCharsets.UTF_8));
+ }
+
+ NameIdentifier filesetIdent = NameIdentifier.of(schemaName, filesetName);
+ String actualJson = getFileInfos(filesetIdent, null, null);
+
+ ObjectMapper mapper = new ObjectMapper();
Review Comment:
There is no need for deserialization; you can directly asset the equals of
string values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]