[ https://issues.apache.org/jira/browse/HIVE-22196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth Jayachandran resolved HIVE-22196. ------------------------------------------ Resolution: Fixed Fixed by HIVE-12371 > Socket timeouts happen when other drivers set DriverManager.loginTimeout > ------------------------------------------------------------------------ > > Key: HIVE-22196 > URL: https://issues.apache.org/jira/browse/HIVE-22196 > Project: Hive > Issue Type: Bug > Components: JDBC, Thrift API > Affects Versions: 1.2.1, 2.0.0, 3.1.2 > Environment: Any Hive JDBC client that uses other SQL clients besides > Hive, or any other kind of JDBC driver (e.g. connection pooling). This can > only happen if the other driver writes values to > {{DriverManager.setLoginTimeout()}}. HikariCP is one suspect, there are > probably others as well. > Reporter: Nathan Clark > Priority: Major > > There are a few somewhat sketchy things happening in Hive/Thrift code in the > JDBC client that result in intermittent "read timed out" (and subsequently > "out of sequence") errors when other JDBC drivers are active in the same > client JVM that set {{DriverManager.loginTimeout}}. > # The login timeout used to initialize a {{HiveConnection}} is populated > from {{DriverManager.loginTimeout}} in the core Java JDBC library. This > sounds like a nice, orthodox place to get a login timeout from, but it's > fundamentally problematic and really shouldn't be used. The reason is that > it's a *global* singleton value, and any JDBC Driver (or any other piece of > code for that matter) can write to it at will (and is implicitly invited to). > The Hive JDBC stack _itself_ writes values to this global setting in a couple > of places seemingly unrelated to the client connection setup. > # The _read_ timeout for Thrift _socket-level_ reads is actually populated > from this _login_ timeout (a.k.a. "connect timeout") setting. (See Thrift's > {{TSocket(String host, int port, int timeout)}} and its callers in > {{HiveAuthFactory}}. Also note the numerous code comments that speak of > setting {{SO_TIMEOUT}} (the socket read timeout) while the actual code > references a variable called {{loginTimeout}}.) Socket reads can occur > thousands of times in an application that does lots of Hive queries, and > their individual workloads are each individually less predictable than simply > getting a connection, which typically happens at most a few times. So you > have a huge probability that a login timeout setting, which seems to usually > receive a reasonable value of 30 seconds if constrained at all, will > occasionally (way too often) be inadequate for a socket read. > # There seems to be no option to set this login timeout (or the actual read > timeout) explicitly as an externalized override setting (but see HIVE-12371). > *Summary:* {\{DriverManager.loginTimeout}} can be innocently set by any JDBC > driver present in the JVM, you can't override it, and it's misused by Hive as > a socket read timeout. There's no way to prevent intermittent read timeouts > in this scenario unless you're lucky enough to find the JDBC driver and > reconfigure its timeout setting to something workable for Hive socket reads. > An easy, crude patch: > modify the first line of {{HiveConnection.setupLoginTimeout()}} from: > {{long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());}} > to: > {{long timeOut = TimeUnit.SECONDS.toMillis(0);}} > This is of course not a robust fix, as server issues during socket reads can > result in a hung client thread. Some other hardcoded value might be more > advisable, as long as it's long enough to prevent spurious read timeouts. > The right approach is to prioritize HIVE-12371 (proposed socket timeout > override setting that doesn't depend on {{DriverManager.loginTimeout}}) and > implement it in all possible versions. -- This message was sent by Atlassian Jira (v8.3.4#803005)