[
https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989775#comment-15989775
]
Hive QA commented on HIVE-16487:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865547/HIVE-16487.02.patch
{color:red}ERROR:{color} -1 due to no test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index]
(batchId=225)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2]
(batchId=234)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
(batchId=223)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4933/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12865547 - PreCommit-HIVE-Build
> Serious Zookeeper exception is logged when a race condition happens
> -------------------------------------------------------------------
>
> Key: HIVE-16487
> URL: https://issues.apache.org/jira/browse/HIVE-16487
> Project: Hive
> Issue Type: Bug
> Components: Locking
> Affects Versions: 3.0.0
> Reporter: Peter Vary
> Assignee: Peter Vary
> Attachments: HIVE-16487.02.patch, HIVE-16487.patch
>
>
> A customer started to see this in the logs, but happily everything was
> working as intended:
> {code}
> 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager:
> [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /hive_zookeeper_namespace/<TABLE_NAME>/LOCK-SHARED-
> {code}
> This was happening, because a race condition between the lock releasing, and
> lock acquiring. The thread releasing the lock removes the parent ZK node just
> after the thread acquiring the lock made sure, that the parent node exists.
> Since this can happen without any real problem, I plan to add NODEEXISTS, and
> NONODE as a transient ZooKeeper exception, so the users are not confused.
> Also, the original author of ZooKeeperHiveLockManager maybe planned to handle
> different ZooKeeperExceptions differently, and the code is hard to
> understand. See the {{continue}} and the {{break}}. The {{break}} only breaks
> the switch, and not the loop which IMHO is not intuitive:
> {code}
> do {
> try {
> [..]
> ret = lockPrimitive(key, mode, keepAlive, parentCreated,
> } catch (Exception e1) {
> if (e1 instanceof KeeperException) {
> KeeperException e = (KeeperException) e1;
> switch (e.code()) {
> case CONNECTIONLOSS:
> case OPERATIONTIMEOUT:
> LOG.debug("Possibly transient ZooKeeper exception: ", e);
> continue;
> default:
> LOG.error("Serious Zookeeper exception: ", e);
> break;
> }
> }
> [..]
> }
> } while (tryNum < numRetriesForLock);
> {code}
> If we do not want to try again in case of a "Serious Zookeeper exception:",
> then we should add a label to the do loop, and break it in the switch.
> If we do want to try regardless of the type of the ZK exception, then we
> should just change the {{continue;}} to {{break;}} and move the lines part of
> the code which did not run in case of {{continue}} to the {{default}} switch,
> so it is easier to understand the code.
> Any suggestions or ideas [~ctang.ma] or [~szehon]?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)