hudi-agent commented on code in PR #18778:
URL: https://github.com/apache/hudi/pull/18778#discussion_r3271078715
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -497,6 +497,42 @@ public Pair<Option<Long>, Option<Long>>
getMinAndMaxEventTime() {
maxEventTime == Long.MIN_VALUE ? Option.empty() :
Option.of(maxEventTime));
}
+ /**
+ * Returns per-partition min/max event time folded across this commit's
write stats.
+ *
+ * <p>Each entry corresponds to a partition for which at least one write
stat carried a
+ * non-null {@link HoodieWriteStat#getMinEventTime()} or {@link
HoodieWriteStat#getMaxEventTime()}.
+ * Partitions whose write stats have no event-time information at all are
omitted from the
+ * returned map. The min/max within a partition are folded with {@code
Math.min} / {@code Math.max}
+ * over its write stats, mirroring the semantics of {@link
#getMinAndMaxEventTime()}.
+ *
+ * <p>This is a pure aggregation over {@code partitionToWriteStats} — it
adds no persisted
+ * bytes and does not change the commit avro schema.
Review Comment:
🤖 nit: "does not change the commit avro schema" reads more like a PR
justification than an API contract — a future reader may wonder why a read-only
aggregation method needs to promise schema stability, and the note could become
misleading if per-partition event times are ever persisted. Have you considered
dropping this sentence (or moving it to a code comment inside the method
instead)?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -497,6 +497,42 @@ public Pair<Option<Long>, Option<Long>>
getMinAndMaxEventTime() {
maxEventTime == Long.MIN_VALUE ? Option.empty() :
Option.of(maxEventTime));
}
+ /**
+ * Returns per-partition min/max event time folded across this commit's
write stats.
+ *
+ * <p>Each entry corresponds to a partition for which at least one write
stat carried a
+ * non-null {@link HoodieWriteStat#getMinEventTime()} or {@link
HoodieWriteStat#getMaxEventTime()}.
+ * Partitions whose write stats have no event-time information at all are
omitted from the
+ * returned map. The min/max within a partition are folded with {@code
Math.min} / {@code Math.max}
+ * over its write stats, mirroring the semantics of {@link
#getMinAndMaxEventTime()}.
+ *
+ * <p>This is a pure aggregation over {@code partitionToWriteStats} — it
adds no persisted
+ * bytes and does not change the commit avro schema.
+ */
+ public Map<String, Pair<Option<Long>, Option<Long>>>
getMinAndMaxEventTimePerPartition() {
+ Map<String, Pair<Option<Long>, Option<Long>>> result = new HashMap<>();
+ for (Map.Entry<String, List<HoodieWriteStat>> entry :
partitionToWriteStats.entrySet()) {
Review Comment:
🤖 nit: the sentinel-value fold (null-check → `Math.min`/`Math.max` → ternary
back to `Option`) is essentially the same inner loop as in
`getMinAndMaxEventTime()`. Could you extract a private
`foldEventTimes(List<HoodieWriteStat>)` helper that both methods share? That
way any future change to null-handling or sentinel semantics only needs to land
in one place.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]