[
https://issues.apache.org/jira/browse/SENTRY-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vadim Spector updated SENTRY-1377:
----------------------------------
Attachment: (was: SENTRY-1377.002.patch)
> improve handling of failures, both in tests and after-test cleanup, in
> TestHDFSIntegration.java
> -----------------------------------------------------------------------------------------------
>
> Key: SENTRY-1377
> URL: https://issues.apache.org/jira/browse/SENTRY-1377
> Project: Sentry
> Issue Type: Sub-task
> Components: Sentry
> Affects Versions: 1.8.0
> Reporter: Vadim Spector
> Assignee: Vadim Spector
> Fix For: 1.8.0
>
> Attachments: SENTRY-1377.001.patch, SENTRY-1377.002.patch
>
>
> There are multiple issues making HDFS sync tests flaky or sometimes failing.
> 1. TestHDFSIntegrationBase.java should provide best-attempt cleanup in
> cleanAfterTest() method. Currently, if any cleanup operation fails, the rest
> of cleanup code is not executed. Cleanup logic would not normally fail if
> corresponding test succeeds. But if some do not, short-circuiting some
> cleanup logic tends to cascade secondary failures which complicate
> troubleshooting.
> 2. TestHDFSIntegration*.java classes do not guarantee calling close() method
> on Connection and Statement objects. It happens because
> a) no try-finally or try-with-resource is used, so tests can skip close()
> calls if fail in the middle.
> b) many methods re-open Connection and Statement multiple times, yet provide
> a single close() at the end.
> 3. Retry logic uses recursion in some places, as in startHiveServer2() and
> verifyOnAllSubDirs. Better to implement it via straightforward retry loop.
> Exception stack trace is more confusing than it needs to be in case of
> recursive calls. Plus, with NUM_RETRIES == 10, at least theoretically, it
> creates running out of stack as an unnecessary failure mechanism.
> 4. startHiveServer2() ignores hiveServer2.start() call failure, only logging
> info message.
> 5. Starting hiveServer2 and Hive metastore in separate threads and then
> keeping those threads alive seems unnecessary, since both servers' start()
> methods create servers running on their own threads anyway. It effectively
> leads to ignoring the start() method failure for both servers. Also, it
> leaves no guarantee that hiveServer2 will be started after the Hive metastore
> - both start from independent threads with no cross-thread coordination
> mechanism in place.
> 6. Thread.sleep() missing in multiple places between HiveServer2 SQL calls
> changing permissions and verifyOnAllSubDirs() calls verifying that those
> changes took effect.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)