[GitHub] [hudi] danny0405 commented on pull request #2433: [HUDI-1511] InstantGenerateOperator support multiple parallelism

GitBox Tue, 12 Jan 2021 00:15:17 -0800


danny0405 commented on pull request #2433:
URL: https://github.com/apache/hudi/pull/2433#issuecomment-758486505



   > That's true. We could remove the file check logic. While the 
implementation of multiple parallelisms is reasonable.
   
   Don't think so, the implementation relies on the checkpoint to start a 
instant, it does not work if the checkpoint data buffer is huge.
   
   > I actually want to know if you have the idea of the compatible version of 
OperatorCoordinator. What is the design?
   
   Already updated the RFC-24 WIKI, at least from the design, it is more 
reasonable.
   
   > Your Step 1 is a large-scale refactoring
   
   I have no choice, because the original code is far away from production 
ready, in order to make it robust enough, i have to make big changes, and i 
already split it into 4 step, if you have no time to review, i would ask others 
to help, thanks though ~
   
   >  It is currently in a critical period before the release of 0.7
   
   I don't think this PR or mine can be merged in the release 0.7, release 0.7 
is already in RC, all the refactoring should be done before 0.8 is released, so 
at least, we do not break the compatibility of the release version. Yes, there 
may be some time that the lower version does not work, but the final release is 
ok.
   
   > In fact, the first and second step you need to optimize is File Assigner. 
SF Express has already implemented it and is ready to provide PR. 
   
   Maybe they have a PR, but IMO, the PR does not have high code quality, there 
are no test frameworks even. No one can tell if the code works correctly or 
suitable for big data set.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on pull request #2433: [HUDI-1511] InstantGenerateOperator support multiple parallelism

Reply via email to