hudi-agent commented on code in PR #17591:
URL: https://github.com/apache/hudi/pull/17591#discussion_r3461925297
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/common/HoodieSparkEngineContext.java:
##########
@@ -135,6 +135,16 @@ public <T> HoodieData<T> parallelize(List<T> data, int
parallelism) {
return HoodieJavaRDD.of(javaSparkContext.parallelize(data, parallelism));
}
+ @Override
+ public <T> HoodieData<T> union(List<HoodieData<T>> dataList) {
+ List<JavaRDD<T>> javaRDDList =
dataList.stream().map(HoodieJavaRDD::getJavaRDD).collect(Collectors.toList());
+ if (javaRDDList.isEmpty()) {
+ return HoodieJavaRDD.of(javaSparkContext.emptyRDD());
+ } else {
+ return HoodieJavaRDD.of(javaSparkContext.union(javaRDDList.toArray(new
JavaRDD[0])));
+ }
Review Comment:
🤖 nit: `new JavaRDD[0]` is a raw type and will produce an unchecked warning
at compile time — have you considered passing `javaRDDList` directly if
`JavaSparkContext.union(List<JavaRDD<T>>)` is available, or casting to
`(JavaRDD<T>[]) new JavaRDD[0]` with a `@SuppressWarnings("unchecked")` to make
the intent explicit?
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java:
##########
@@ -80,6 +80,8 @@ public <T> HoodieData<T> parallelize(List<T> data) {
}
}
+ public abstract <T> HoodieData<T> union(List<HoodieData<T>> dataList);
Review Comment:
🤖 nit: since the list-based `union` body is identical in all four non-Spark
contexts (Flink, Java, Local, ExecutorServiceBased), could you make this a
concrete default method here instead of abstract? `HoodieEngineContext` already
has a concrete `parallelize(List<T> data)` that delegates down — the same
pattern would eliminate the four copy-pasted overrides.
<sub><i>⚠️ AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java:
##########
@@ -80,6 +80,8 @@ public <T> HoodieData<T> parallelize(List<T> data) {
}
}
+ public abstract <T> HoodieData<T> union(List<HoodieData<T>> dataList);
Review Comment:
🤖 One thing worth double-checking on the underlying concern: I couldn't find
an `ExecutorServiceBasedEngineContext` class anywhere in the repo. The only
subclasses of `HoodieEngineContext` are `HoodieLocalEngineContext`,
`HoodieSparkEngineContext`, `HoodieJavaEngineContext`, and
`HoodieFlinkEngineContext` — all four of which the PR implements `union` for.
So the new abstract method shouldn't cause a missing-override compile failure
unless I'm missing a subclass somewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]