aokolnychyi commented on code in PR #4629:
URL: https://github.com/apache/iceberg/pull/4629#discussion_r876107119
##########
.palantir/revapi.yml:
##########
@@ -27,6 +27,12 @@ acceptedBreaks:
- code: "java.method.addedToInterface"
new: "method ThisT
org.apache.iceberg.SnapshotUpdate<ThisT>::scanManifestsWith(java.util.concurrent.ExecutorService)"
justification: "Accept all changes prior to introducing API
compatibility checks"
+ - code: "java.method.addedToInterface"
Review Comment:
I am okay with this given other changes, upcoming 1.0 and that it is mostly
Iceberg itself that provides implementations of this interface. Even if someone
has custom actions, they probably reuse the provided result implementation.
If other folks are concerned, we could default the new methods.
##########
api/src/main/java/org/apache/iceberg/ManifestContent.java:
##########
@@ -35,4 +35,12 @@ public enum ManifestContent {
public int id() {
return id;
}
+
+ public static ManifestContent fromId(int id) {
+ switch (id) {
Review Comment:
nit: would it be safer to throw an exception if `id` is unknown just in case?
##########
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/actions/TestExpireSnapshotsAction.java:
##########
@@ -122,13 +140,18 @@ private Long rightAfterSnapshot(long snapshotId) {
return end;
}
- private void checkExpirationResults(long expectedDatafiles, long
expectedManifestsDeleted,
- long expectedManifestListsDeleted, ExpireSnapshots.Result results) {
+ private void checkExpirationResults(long expectedDatafiles, long
expectedPosDeleteFiles, long expectedEqDeleteFiles,
Review Comment:
Thanks for fixing the indentation!
##########
.palantir/revapi.yml:
##########
@@ -27,6 +27,12 @@ acceptedBreaks:
- code: "java.method.addedToInterface"
new: "method ThisT
org.apache.iceberg.SnapshotUpdate<ThisT>::scanManifestsWith(java.util.concurrent.ExecutorService)"
justification: "Accept all changes prior to introducing API
compatibility checks"
+ - code: "java.method.addedToInterface"
+ new: "method long
org.apache.iceberg.actions.ExpireSnapshots.Result::deletedEqualityDeleteFilesCount()"
+ justification: "Adding method to interface"
Review Comment:
nit: Shall we be a little bit more specific?
##########
core/src/main/java/org/apache/iceberg/actions/BaseExpireSnapshotsActionResult.java:
##########
@@ -22,13 +22,29 @@
public class BaseExpireSnapshotsActionResult implements ExpireSnapshots.Result
{
private final long deletedDataFilesCount;
+ private final long deletedPosDeleteFilesCount;
+ private final long deletedEqDeleteFilesCount;
private final long deletedManifestsCount;
private final long deletedManifestListsCount;
public BaseExpireSnapshotsActionResult(long deletedDataFilesCount,
long deletedManifestsCount,
long deletedManifestListsCount) {
this.deletedDataFilesCount = deletedDataFilesCount;
+ this.deletedPosDeleteFilesCount = 0;
+ this.deletedEqDeleteFilesCount = 0;
+ this.deletedManifestsCount = deletedManifestsCount;
+ this.deletedManifestListsCount = deletedManifestListsCount;
+ }
+
+ public BaseExpireSnapshotsActionResult(long deletedDataFilesCount,
+ long deletedPosDeleteFilesCount,
+ long deletedEqDeleteFileCount,
Review Comment:
nit: `File` -> `Files`
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -122,18 +128,29 @@ protected Table newStaticTable(TableMetadata metadata,
FileIO io) {
return new BaseTable(ops, metadataFileLocation);
}
- // builds a DF of delete and data file locations by reading all manifests
- protected Dataset<Row> buildValidContentFileDF(Table table) {
+ // builds a DF of delete and data file path and type by reading all manifests
+ protected Dataset<Row> buildValidContentFileWithTypeDF(Table table) {
JavaSparkContext context =
JavaSparkContext.fromSparkContext(spark.sparkContext());
- Broadcast<FileIO> ioBroadcast =
context.broadcast(SparkUtil.serializableFileIO(table));
+ Broadcast<Table> tableBroadcast =
context.broadcast(SerializableTable.copyOf(table));
Dataset<ManifestFileBean> allManifests = loadMetadataTable(table,
ALL_MANIFESTS)
- .selectExpr("path", "length", "partition_spec_id as partitionSpecId",
"added_snapshot_id as addedSnapshotId")
+ .selectExpr("content",
Review Comment:
nit: you could probably move `content` on a new line as well.
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -122,18 +128,29 @@ protected Table newStaticTable(TableMetadata metadata,
FileIO io) {
return new BaseTable(ops, metadataFileLocation);
}
- // builds a DF of delete and data file locations by reading all manifests
- protected Dataset<Row> buildValidContentFileDF(Table table) {
+ // builds a DF of delete and data file path and type by reading all manifests
+ protected Dataset<Row> buildValidContentFileWithTypeDF(Table table) {
JavaSparkContext context =
JavaSparkContext.fromSparkContext(spark.sparkContext());
- Broadcast<FileIO> ioBroadcast =
context.broadcast(SparkUtil.serializableFileIO(table));
+ Broadcast<Table> tableBroadcast =
context.broadcast(SerializableTable.copyOf(table));
Review Comment:
Not directly related to this PR but should we just use `sparkContext`
defined as the instance variable instead of creating a new wrapper in this
method?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]