edgarRd commented on a change in pull request #2293:
URL: https://github.com/apache/iceberg/pull/2293#discussion_r600712368
##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -46,6 +51,8 @@ public RewriteFiles rewriteFiles(Set<DataFile> filesToDelete,
Set<DataFile> file
"Files to delete cannot be null or empty");
Preconditions.checkArgument(filesToAdd != null && !filesToAdd.isEmpty(),
"Files to add can not be null or empty");
+ Preconditions.checkArgument(filesToAdd.stream().allMatch(df -> df.specId()
== writeSpec().specId()),
+ "Files to add can not have a different spec than the rewrite files
spec");
Review comment:
I think this is a great idea. While what you mention is correct, there
are many test cases where we add `DataFile`s and `DeleteFile`s where the spec
has no specId relative to the table, e.g. it's `0` when initialized, and takes
a specId only until it is added via `TableMetadata#updatePartitionSpec` -
examples of tests are:
https://github.com/apache/iceberg/blob/master/core/src/test/java/org/apache/iceberg/TestRewriteManifests.java#L579
where the PartitionSpec used in the `DataFile` is the one that was initialized
with `0`, rather than `table.spec()`.
Before proceeding, I think I'd like to know if this way of writing tests is
want we want or should we make sure we stay consistent and add `DataFile`s and
`DeleteFile`s with the PartitionSpec with the initialized specId since they are
immutable. Since from the `ContentFile` we don't have direct access to the
PartitonSpec - only using the specId - I don't think it's possible to validate
by the actual spec object.
This issue is found in test cases that add `DataFile`s and `DeleteFile`s
after a partition spec update and do not use the partition spec with the specId.
I can work on fixing the tests, but that will increase the diff of this PR,
however if that makes it consistent to the behavior we want I'm happy to do it.
Thanks for your guidance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]