Konstantin Bereznyakov created HIVE-29642:
---------------------------------------------

             Summary: Race in PartitionManagementTask test-only counters causes 
intermittent TestPartitionManagement failures
                 Key: HIVE-29642
                 URL: https://issues.apache.org/jira/browse/HIVE-29642
             Project: Hive
          Issue Type: Bug
            Reporter: Konstantin Bereznyakov


[https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-6505/6/tests]
{code:java}
Testing / split-04 / PostProcess / testPartitionDiscoveryTransactionalTable – 
org.apache.hadoop.hive.metastore.TestPartitionManagement3sErrorexpected:<2> but 
was:<1>Stacktracejava.lang.AssertionError: expected:<2> but was:<1>       at 
org.junit.Assert.fail(Assert.java:89) {code}

A subsequent re-run has passed.



TestPartitionManagement.testPartitionDiscoveryTransactionalTable asserts that 
exactly 2 of 3 concurrent tasks are skipped, using test-only static counters in 
PartitionManagementTask. The assertion is flaky for two reasons. First, the JVM 
may schedule the 3 tasks so they do not actually overlap, in which case no 
skips happen at all. Second, the counters themselves are racy: the 
skippedAttempts = 0 reset inside the lock can clear the value while other 
threads are running skippedAttempts++ outside the lock.

{*}The proposed solution{*}: Remove the counters - they were added for testing 
in HIVE-20707 and nothing in production reads them. Have the test verify the 
same properties through a Log4j2 ListAppender that observes the existing skip 
and discovery log messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to