vinothchandar commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-691745384
I can help address the remaining feedback. I will push a small diff today/tmrw. Overall, looks like a reasonable start. The major feedback I still have is the following >would a parallelDo(func, parallelism) method in HoodieEngineContext help us avoid a lot of base/child class duplication of logic like this? Lot of usages are like `jsc.parallelize(list, parallelism).map(func)` , which all require a base-child class now. I am wondering if its easier to take those usages alone and implement as `engineContext.parallelDo(list, func, parallelism)`. This can be the lowest common denominator across Spark/Flink etc. We can avoid splitting a good chunk of classes if we do this IMO. If this is interesting, and we agree, I can try to quantify. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
