steveloughran commented on code in PR #16500:
URL: https://github.com/apache/iceberg/pull/16500#discussion_r3336611200
##########
core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java:
##########
@@ -156,6 +156,27 @@ public static String dvDesc(DeleteFile deleteFile) {
deleteFile.referencedDataFile());
}
+ /**
+ * Validates that the deletion-vector offset and length on a {@link
DeleteFile} are well-formed
+ * before they are consumed by a reader. Hostile or corrupted manifest
metadata may otherwise
+ * trigger a {@link NegativeArraySizeException}, an invalid seek, or a
multi-gigabyte allocation
+ * when the DV blob is read.
+ */
+ public static void validateDV(DeleteFile dv) {
+ Preconditions.checkArgument(
+ dv.contentOffset() != null, "Invalid DV, offset cannot be null: %s",
dvDesc(dv));
+ Preconditions.checkArgument(
+ dv.contentSizeInBytes() != null, "Invalid DV, length cannot be null:
%s", dvDesc(dv));
+ Preconditions.checkArgument(
+ dv.contentOffset() >= 0, "Invalid DV, offset must be non-negative:
%s", dvDesc(dv));
+ Preconditions.checkArgument(
+ dv.contentSizeInBytes() >= 0, "Invalid DV, length must be
non-negative: %s", dvDesc(dv));
+ Preconditions.checkArgument(
+ dv.contentSizeInBytes() <= Integer.MAX_VALUE,
Review Comment:
FWIW, probably something to add in the AGENTS.md code, no emdash, no
non-ascii unicode except for UTF-related experiments. separate PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]