deniskuzZ commented on code in PR #4899: URL: https://github.com/apache/hive/pull/4899#discussion_r2321658402
########## ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java: ########## @@ -293,6 +293,13 @@ protected void initializeAndRunProcessor(Map<String, LogicalInput> inputs, rproc.run(); perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.TEZ_RUN_PROCESSOR); + + // Try to call canCommit to AM. If there is no other speculative attempt execute canCommit, then continue. + // If there are other speculative attempt execute canCommit first, then wait until the attempt is killed + // or the committed task fails. + while (!this.processorContext.canCommit()) { Review Comment: 1) [TaskImpl::canCommit](https://github.com/apache/tez/blob/6683866ea5e9a7b900ccbcd617e49588974019ae/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L694) is already wrapped with `writeLock` - OK however, if `commitAttempt != taskAttemptID` (speculative attempt was already committed), how are we planning to leave the loop `while (!getContext().canCommit()) {Thread.sleep(100);}`? ```` LOG.debug("{} is current committer. Commit waiting for: {}", commitAttempt, taskAttemptID); return false; ```` I've found additional explanation in a JIRA ```` The problem is that both speculatively executed tasks commit the file. This will not happen in the Tez examples because they will try canCommit, which can guarantee that one and only one task attempt commit successfully. If one task attempt executes canCommit successfully, the other one will be stuck by canCommit until it receives a kill signal. ```` so, is it intentional that 2nd attempt would be in busy-waiting until it receives a kill signal? is it a jvm process kill? 2) > In fact, tez example use this way to avoid commit simultaneously. I just copy from here。 I think, usage pattern there is a bit different (no speculative execution): ```` // This will loop till the AM asks for the task to be killed. As // against, the AM sending a signal to the task to kill itself // gracefully. The AM waits for the current committer to successfully // complete and then kills us. Until then we wait in case the // current committer fails and we get chosen to commit. while (!getContext().canCommit()) { Thread.sleep(100); } ```` ########## ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java: ########## @@ -293,6 +293,13 @@ protected void initializeAndRunProcessor(Map<String, LogicalInput> inputs, rproc.run(); perfLogger.perfLogEnd(CLASS_NAME, PerfLogger.TEZ_RUN_PROCESSOR); + + // Try to call canCommit to AM. If there is no other speculative attempt execute canCommit, then continue. + // If there are other speculative attempt execute canCommit first, then wait until the attempt is killed + // or the committed task fails. + while (!this.processorContext.canCommit()) { Review Comment: 1) [TaskImpl::canCommit](https://github.com/apache/tez/blob/6683866ea5e9a7b900ccbcd617e49588974019ae/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L694) is already wrapped with `writeLock` - OK however, if `commitAttempt != taskAttemptID` (speculative attempt was already committed), how are we planning to leave the loop `while (!getContext().canCommit()) {Thread.sleep(100);}`? ```` LOG.debug("{} is current committer. Commit waiting for: {}", commitAttempt, taskAttemptID); return false; ```` I've found additional explanation in a JIRA ```` The problem is that both speculatively executed tasks commit the file. This will not happen in the Tez examples because they will try canCommit, which can guarantee that one and only one task attempt commit successfully. If one task attempt executes canCommit successfully, the other one will be stuck by canCommit until it receives a kill signal. ```` so, is it intentional that 2nd attempt would be in busy-waiting until it receives a kill signal? is it a jvm process kill? 2) > In fact, tez example use this way to avoid commit simultaneously. I just copy from here。 I think, usage pattern there is a bit different (no speculative execution): ```` // This will loop till the AM asks for the task to be killed. As // against, the AM sending a signal to the task to kill itself // gracefully. The AM waits for the current committer to successfully // complete and then kills us. Until then we wait in case the // current committer fails and we get chosen to commit. while (!getContext().canCommit()) { Thread.sleep(100); } ```` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org