codope commented on code in PR #6485:
URL: https://github.com/apache/hudi/pull/6485#discussion_r960549927
##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##########
@@ -399,4 +399,19 @@ public boolean isEmpty(HoodieInstant instant) {
public String toString() {
return this.getClass().getName() + ": " +
instants.stream().map(Object::toString).collect(Collectors.joining(","));
}
+
+ /**
+ * Merge this timeline with the given timeline.
+ */
+ public HoodieDefaultTimeline mergeTimeline(HoodieDefaultTimeline timeline) {
+ Stream<HoodieInstant> instantStream = Stream.concat(instants.stream(),
timeline.getInstants()).sorted();
+ Function<HoodieInstant, Option<byte[]>> details = instant -> {
+ if (instants.stream().anyMatch(i -> i.equals(instant))) {
+ return this.getInstantDetails(instant);
+ } else {
+ return timeline.getInstantDetails(instant);
+ }
Review Comment:
I went through the code. The archived timeline details are computed just
once. But active tiimeline instant details are re-computed. Typically, for
older tables, archive timeline is much bigger than active timeline so the
amortized cost should be within bounds.
To optimize for active timeline, we need a map/cache of instant details in
`HoodieActiveTimeline` and then in `HoodieActiveTimeline#getInstantDetails`, we
have to check this cache before reading from file. I can take it up in a
separate PR. It might be beneficial at other places too, irrespective of this
patch. HUDI-4768
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]