szehon-ho commented on code in PR #4629:
URL: https://github.com/apache/iceberg/pull/4629#discussion_r859211942
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -122,18 +124,24 @@ protected Table newStaticTable(TableMetadata metadata,
FileIO io) {
return new BaseTable(ops, metadataFileLocation);
}
- // builds a DF of delete and data file locations by reading all manifests
- protected Dataset<Row> buildValidContentFileDF(Table table) {
+ // builds a DF of delete and data file path and type by reading all manifests
+ protected Dataset<Row> buildValidContentFileTypeDF(Table table) {
JavaSparkContext context =
JavaSparkContext.fromSparkContext(spark.sparkContext());
- Broadcast<FileIO> ioBroadcast =
context.broadcast(SparkUtil.serializableFileIO(table));
+ Broadcast<Table> tableBroadcast =
context.broadcast(SerializableTable.copyOf(table));
Dataset<ManifestFileBean> allManifests = loadMetadataTable(table,
ALL_MANIFESTS)
.selectExpr("path", "length", "partition_spec_id as partitionSpecId",
"added_snapshot_id as addedSnapshotId")
.dropDuplicates("path")
.repartition(spark.sessionState().conf().numShufflePartitions()) //
avoid adaptive execution combining tasks
.as(Encoders.bean(ManifestFileBean.class));
- return allManifests.flatMap(new ReadManifest(ioBroadcast),
Encoders.STRING()).toDF(FILE_PATH);
+ return allManifests.flatMap(new ReadManifest(tableBroadcast),
Encoders.bean(ContentFileTypeBean.class))
+ .toDF(FILE_PATH, FILE_TYPE);
+ }
+
+ // builds a DF of delete and data file paths by reading all manifests
+ protected Dataset<Row> buildValidContentFileDF(Table table) {
Review Comment:
Yea it was to preserve the backward compatiblity of the other actions, which
didn't care much about the type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]