Re: [PR] HIVE-27899: Killed speculative execution task attempt should not comm… [hive]

via GitHub Thu, 04 Sep 2025 10:56:57 -0700


deniskuzZ commented on code in PR #4899:
URL: https://github.com/apache/hive/pull/4899#discussion_r2321658402



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java:
##########
@@ -293,6 +293,13 @@ protected void initializeAndRunProcessor(Map<String, 
LogicalInput> inputs,
       rproc.run();
 
       perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.TEZ_RUN_PROCESSOR);
+
+      // Try to call canCommit to AM. If there is no other speculative attempt 
execute canCommit, then continue.
+      // If there are other speculative attempt execute canCommit first, then 
wait until the attempt is killed
+      // or the committed task fails.
+      while (!this.processorContext.canCommit()) {

Review Comment:
   1) 
   
[TaskImpl::canCommit](https://github.com/apache/tez/blob/6683866ea5e9a7b900ccbcd617e49588974019ae/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L694)
 is already wrapped with `writeLock` - OK
   
   however, if `commitAttempt != taskAttemptID` (speculative attempt was 
already committed), how are we planning to leave the loop `while 
(!getContext().canCommit()) {Thread.sleep(100);}`?
   ````
   LOG.debug("{} is current committer. Commit waiting for: {}", commitAttempt, 
taskAttemptID);
   return false;
   ````
   I've found additional explanation in a JIRA 
   ````
   The problem is that both speculatively executed tasks commit the file. This 
will not happen in the Tez examples because they will try canCommit, which can 
guarantee that one and only one task attempt commit successfully. If one task 
attempt executes canCommit successfully, the other one will be stuck by 
canCommit until it receives a kill signal.
   ````
   so, is it intentional that 2nd attempt would be in busy-waiting until it 
receives a kill signal? is it a jvm process kill?
   2)
   > In fact, tez example use this way to avoid commit simultaneously. I just 
copy from here。
   
   I think, usage pattern there is a bit different (no speculative execution):
   ````
   // This will loop till the AM asks for the task to be killed. As
   // against, the AM sending a signal to the task to kill itself
   // gracefully. The AM waits for the current committer to successfully
   // complete and then kills us. Until then we wait in case the
   // current committer fails and we get chosen to commit.
   while (!getContext().canCommit()) {
     Thread.sleep(100);
   }
   ````



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java:
##########
@@ -293,6 +293,13 @@ protected void initializeAndRunProcessor(Map<String, 
LogicalInput> inputs,
       rproc.run();
 
       perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.TEZ_RUN_PROCESSOR);
+
+      // Try to call canCommit to AM. If there is no other speculative attempt 
execute canCommit, then continue.
+      // If there are other speculative attempt execute canCommit first, then 
wait until the attempt is killed
+      // or the committed task fails.
+      while (!this.processorContext.canCommit()) {

Review Comment:
   1) 
   
[TaskImpl::canCommit](https://github.com/apache/tez/blob/6683866ea5e9a7b900ccbcd617e49588974019ae/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L694)
 is already wrapped with `writeLock` - OK
   
   however, if `commitAttempt != taskAttemptID` (speculative attempt was 
already committed), how are we planning to leave the loop `while 
(!getContext().canCommit()) {Thread.sleep(100);}`?
   ````
   LOG.debug("{} is current committer. Commit waiting for: {}", commitAttempt, 
taskAttemptID);
   return false;
   ````
   I've found additional explanation in a JIRA 
   ````
   The problem is that both speculatively executed tasks commit the file. This 
will not happen in the Tez examples because they will try canCommit, which can 
guarantee that one and only one task attempt commit successfully. If one task 
attempt executes canCommit successfully, the other one will be stuck by 
canCommit until it receives a kill signal.
   ````
   so, is it intentional that 2nd attempt would be in busy-waiting until it 
receives a kill signal? is it a jvm process kill?
   
   2)
   > In fact, tez example use this way to avoid commit simultaneously. I just 
copy from here。
   
   I think, usage pattern there is a bit different (no speculative execution):
   ````
   // This will loop till the AM asks for the task to be killed. As
   // against, the AM sending a signal to the task to kill itself
   // gracefully. The AM waits for the current committer to successfully
   // complete and then kills us. Until then we wait in case the
   // current committer fails and we get chosen to commit.
   while (!getContext().canCommit()) {
     Thread.sleep(100);
   }
   ````



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-27899: Killed speculative execution task attempt should not comm… [hive]

Reply via email to