zhengchenyu commented on PR #4899:
URL: https://github.com/apache/hive/pull/4899#issuecomment-2212857850

   @deniskuzZ @ayushtkn @abstractdog @glapark 
   
   Here we have the following problem about repetition in our production 
problem:
   
   * (1) When speculative execution is enabled, I found that the cluster 
occasionally has file duplication problem.
   
   When speculative execution is turned on, the slower task attempt will be 
killed. I found that the killed task may commit unfinished files. So I submit 
HIVE-25561 to solve this problem.
   
   > Note: In fact, kill task attempt will cause this problem, Enabling 
speculative execution just increases the probability of killing task attempts.
   
   * (2) When I apply HIVE-25561, I found that duplicated file still happened, 
but the probability was very very low.
   
   I found that the duplication of files was caused by two different attempts 
of the same task being committed at the same time. So I submit HIVE-27899 to 
solve this problem.
   
   * (3) After the task attempt is committed and before the appMaster receives 
the task success message, if an exception occurs, file duplication may occur.  
See for  information in HIVE-27985. 
   
   We can use task-based but not attempt-based file names. Even if there are 
duplicate files, since the names are the same, the file committed later will 
delete the file committed earlier. And because HIVE-25561 is applied, we can 
make sure that all commit file is generated by successfull task attempt . So I 
submit HIVE-27986.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to