wangxianghu edited a comment on pull request #1827:
URL: https://github.com/apache/hudi/pull/1827#issuecomment-692074997


   > I can help address the remaining feedback. I will push a small diff 
today/tmrw.
   > Overall, looks like a reasonable start.
   > 
   > The major feedback I still have is the following
   > 
   > > would a parallelDo(func, parallelism) method in HoodieEngineContext help 
us avoid a lot of base/child class duplication of logic like this?
   > 
   > Lot of usages are like `jsc.parallelize(list, parallelism).map(func)` , 
which all require a base-child class now. I am wondering if its easier to take 
those usages alone and implement as `engineContext.parallelDo(list, func, 
parallelism)`. This can be the lowest common denominator across Spark/Flink 
etc. We can avoid splitting a good chunk of classes if we do this IMO. If this 
is interesting, and we agree, I can try to quantify.
   
   Hi @vinothchandar, how about this demo?
   
![image](https://user-images.githubusercontent.com/49835526/93096069-826f7c00-f6d6-11ea-9453-a96bd6ff8157.png)
   
![image](https://user-images.githubusercontent.com/49835526/93096090-8ac7b700-f6d6-11ea-971b-d6956b016988.png)
   
![image](https://user-images.githubusercontent.com/49835526/93096113-91eec500-f6d6-11ea-84c3-2712f72530fa.png)
   
![image](https://user-images.githubusercontent.com/49835526/93096140-9a470000-f6d6-11ea-9685-686c507bf8ad.png)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to