yanghua commented on pull request #2433:
URL: https://github.com/apache/hudi/pull/2433#issuecomment-758474842


   > The file check of each task is useless because even if a task of the 
source has no data for some time interval, the checkpoint still can trigger 
normally. So all task checkpoint successfully does not mean there is data. (I 
have solved this in mu #step1 PR though)
   
   That's true. We could remove the file check logic. While the implementation 
of multiple parallelisms is reasonable.
   
   > There is no need to checkpoint the write status in 
KeyedWriteProcessOperator, because we can not start a new instant if the last 
instant failes, the more proper/simple way is to retry the commit actions 
several times and trigger failover if still fails.
   
   Sounds reasonable, agree.
   
   > BTW, IMO, we should finish RFC-24 first as fast as possible, it sloves 
many bugs and has many improvements. After that i would add a compatible 
pipeline and this PR can apply there, and i can help to review.
   
   I personally think that a better form of community participation is:
   
   1) Control the granularity of changes;
   2) Each submission is a complete function point, so that the working 
behavior of the code does not change;
   
   I actually want to know if you have removed the compatible version of 
`OperatorCoordinator`. What is the design? Will it be better than the current 
one? Will it be better than this PR design?
   
   All this is opaque.
   
   Your Step 1 is a large-scale refactoring, and the merged code will make this 
client immediately unavailable. It is currently in a critical period before the 
release of 0.7 (if we do not have the energy to merge step 2, 3 in the short 
term?). Why can't we optimize it step by step?
   
   In fact, the first and second step you need to optimize is File Assigner. SF 
Express has already implemented it and is ready to provide PR. In fact, we 
improve on the existing basis, and risks and changes are controllable, right? I 
think that in the end, we provide a more "elegant implementation" of 
OperatorCoordinator for the higher version, which is the correct order.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to