[
https://issues.apache.org/jira/browse/SENTRY-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vadim Spector updated SENTRY-1377:
----------------------------------
Description:
There are multiple issues making HDFS sync tests flaky or sometimes failing.
1. TestHDFSIntegrationBase.java should provide best-attempt cleanup in
cleanAfterTest() method. Currently, if any cleanup operation fails, the rest of
cleanup code is not executed. Cleanup logic would not normally fail if
corresponding test succeeds. But if some do not, short-circuiting some cleanup
logic tends to cascade secondary failures which complicate troubleshooting.
2. TestHDFSIntegration*.java classes do not guarantee calling close() method on
Connection and Statement objects. It happens because
a) no try-finally or try-with-resource is used, so test can cause skipping the
close() calls.
b) many methods re-open Connection and Statement multiple times, yet provide
asingle close() at the end.
3. Retry logic uses recursion in some places, as in startHiveServer2() and
verifyOnAllSubDirs. Better to implement it via straightforward retry loop.
Exception stack trace is more confusing than it needs to be in case of
recursive calls. Plus, with NUM_RETRIES == 10, at least theoretically, it
creates running out of stack as an unnecessary failure mechanism.
4. startHiveServer2() ignores hiveServer2.start() call failure, only logging
info message.
5. Starting hiveServer2 and Hive metastore in separate threads and then keeping
those threads alive seems unnecessary, since both servers' start() methods
create servers running on their own threads anyway. It effectively leads to
ignoring the start() method failure for both servers. Also, it leaves no
guarantee that hiveServer2 will be started after the Hive metastore - both
start from independent threads with no cross-thread coordination mechanism in
place.
6. Thread.sleep() missing in multiple places between HiveServer2 SQL calls
changing permissions and verifyOnAllSubDirs() calls verifying that those
changes took effect.
was:
1. TestHDFSIntegration.java should provide best-attempt cleanup in
cleanAfterTest() method. Currently, if any cleanup operation fails, the rest of
cleanup code is not executed. Cleanup logic would not normally fail if
corresponding test succeeds. But if some do not, short-circuiting some cleanup
logic tends to cascade secondary failures which complicate troubleshooting.
2. Test methods would short-circuit closing database Connection's and
Statement's if the test logic fails. Moreover, some test methods have multiple
initializations of Connection ans Statement, but only one stmt.close();
conn.close(); pair at the end. Correct pattern for each distinct
{Connection,Statement} pair would be:
conn = null;
stmt = null;
try {
..... test logic here, including conn and stmt initialization ...
} finally { safeClose(conn, stmt); }
private static safeClose(Connection conn, Statement stmt) {
if (stmt != null) try { stmt.close(); } catch (Exception ignore) {}
if (conn != null) try { conn.close(); } catch (Exception ignore) {}
}
Testing jdbc driver implementation is not in the scope of HDFS integration
tests, so ignoring Connection.close() and Statement.close() failures should not
be a concern for this test class.
3. Retry logic uses recursion in some places, as in startHiveServer2(),
verifyQuery(), verifyOnAllSubDirs. Better to implement it via straightforward
retry loop. Exception stack trace is more confusing than it needs to be in case
of recursive calls. Plus, with NUM_RETRIES == 10, at least theoretically, it
creates running out of stack as an unnecessary failure mechanism.
4. startHiveServer2() ignores hiveServer2.start() call failure, only logging
info message.
5. Starting hiveServer2 and Hive metastore in separate threads and then keeping
those threads alive seems unnecessary, since both servers' start() methods
create servers running on their own threads anyway. It effectively leads to
ignoring the start() method failure for both servers. Also, it leaves no
guarantee that hiveServer2 will be started after the Hive metastore - both
start from independent threads with no cross-thread coordination mechanism in
place.
6. miniDFS.getFileSystem().mkdirs(tmpHDFSDir) call missing in several places
7. Thread.sleep() missing in multiple places between HiveServer2 SQL calls
changing permissions and verifyOnAllSubDirs() calls verifying that those
changes took effect.
> improve handling of failures, both in tests and after-test cleanup, in
> TestHDFSIntegration.java
> -----------------------------------------------------------------------------------------------
>
> Key: SENTRY-1377
> URL: https://issues.apache.org/jira/browse/SENTRY-1377
> Project: Sentry
> Issue Type: Sub-task
> Components: Sentry
> Affects Versions: 1.8.0
> Reporter: Vadim Spector
> Assignee: Vadim Spector
> Fix For: 1.8.0
>
>
> There are multiple issues making HDFS sync tests flaky or sometimes failing.
> 1. TestHDFSIntegrationBase.java should provide best-attempt cleanup in
> cleanAfterTest() method. Currently, if any cleanup operation fails, the rest
> of cleanup code is not executed. Cleanup logic would not normally fail if
> corresponding test succeeds. But if some do not, short-circuiting some
> cleanup logic tends to cascade secondary failures which complicate
> troubleshooting.
> 2. TestHDFSIntegration*.java classes do not guarantee calling close() method
> on Connection and Statement objects. It happens because
> a) no try-finally or try-with-resource is used, so test can cause skipping
> the close() calls.
> b) many methods re-open Connection and Statement multiple times, yet provide
> asingle close() at the end.
> 3. Retry logic uses recursion in some places, as in startHiveServer2() and
> verifyOnAllSubDirs. Better to implement it via straightforward retry loop.
> Exception stack trace is more confusing than it needs to be in case of
> recursive calls. Plus, with NUM_RETRIES == 10, at least theoretically, it
> creates running out of stack as an unnecessary failure mechanism.
> 4. startHiveServer2() ignores hiveServer2.start() call failure, only logging
> info message.
> 5. Starting hiveServer2 and Hive metastore in separate threads and then
> keeping those threads alive seems unnecessary, since both servers' start()
> methods create servers running on their own threads anyway. It effectively
> leads to ignoring the start() method failure for both servers. Also, it
> leaves no guarantee that hiveServer2 will be started after the Hive metastore
> - both start from independent threads with no cross-thread coordination
> mechanism in place.
> 6. Thread.sleep() missing in multiple places between HiveServer2 SQL calls
> changing permissions and verifyOnAllSubDirs() calls verifying that those
> changes took effect.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)