szehon-ho commented on code in PR #4629:
URL: https://github.com/apache/iceberg/pull/4629#discussion_r861445948


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -176,16 +186,35 @@ protected Dataset<Row> loadMetadataTable(Table table, 
MetadataTableType type) {
     return SparkTableUtil.loadMetadataTable(spark, table, type);
   }
 
-  private static class ReadManifest implements 
FlatMapFunction<ManifestFileBean, String> {
-    private final Broadcast<FileIO> io;
+  private static class ReadManifest implements 
FlatMapFunction<ManifestFileBean, Tuple2<String, String>> {
+    private final Broadcast<Table> table;
 
-    ReadManifest(Broadcast<FileIO> io) {
-      this.io = io;
+    ReadManifest(Broadcast<Table> table) {
+      this.table = table;
     }
 
     @Override
-    public Iterator<String> call(ManifestFileBean manifest) {
-      return new ClosingIterator<>(ManifestFiles.readPaths(manifest, 
io.getValue()).iterator());
+    public Iterator<Tuple2<String, String>> call(ManifestFileBean manifest) {
+      return new ClosingIterator(entries(manifest));
+    }
+
+    public CloseableIterator<Tuple2<String, String>> entries(ManifestFileBean 
manifest) {
+      switch (manifest.content()) {
+        case DATA:

Review Comment:
   Good catch, this seems it is working by chance, ManifestFiles.read() only 
checks to make sure content=Data, but doesn't use the content anywhere else in 
this path.  The file returned can still be delete or data file.
   
   Not sure if it's worth to implement manifests.content field (I guess that's 
what was done in the original pr: #2518).   if there is not much use now, maybe 
we can do it in a separately and here just add a new method to 
ManifestFiles.files() that returns a Pair of type/path, similar to 
ManifestFiles.paths().  @aokolnychyi any preference?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to