Nathan Clark created HIVE-22196:
-----------------------------------

             Summary: Socket timeouts happen when other drivers set 
DriverManager.loginTimeout
                 Key: HIVE-22196
                 URL: https://issues.apache.org/jira/browse/HIVE-22196
             Project: Hive
          Issue Type: Bug
          Components: JDBC, Thrift API
    Affects Versions: 3.1.2, 2.0.0, 1.2.1
         Environment: Any Hive JDBC client that uses other SQL clients besides 
Hive, or any other kind of JDBC driver (e.g. connection pooling). This can only 
happen if the other driver writes values to 
{{DriverManager.setLoginTimeout()}}. HikariCP is one suspect, there are 
probably others as well.
            Reporter: Nathan Clark


There are a few somewhat sketchy things happening in Hive/Thrift code in the 
JDBC client that result in intermittent "read timed out" (and subsequently "out 
of sequence") errors when other JDBC drivers are active in the same client JVM 
that set {{DriverManager.loginTimeout}}.
 # The login timeout used to initialize a {{HiveConnection}} is populated from 
{{DriverManager.loginTimeout}} in the core Java JDBC library. This sounds like 
a nice, orthodox place to get a login timeout from, but it's fundamentally 
problematic and really shouldn't be used. The reason is that it's a *global* 
singleton value, and any JDBC Driver (or any other piece of code for that 
matter) can write to it at will (and is implicitly invited to). The Hive JDBC 
stack _itself_ writes values to this global setting in a couple of places 
seemingly unrelated to the client connection setup.
 # The _read_ timeout for Thrift _socket-level_ reads is actually populated 
from this _login_ timeout (a.k.a. "connect timeout") setting. (See Thrift's 
{{TSocket(String host, int port, int timeout)}} and its callers in 
{{HiveAuthFactory}}. Also note the numerous code comments that speak of setting 
{{SO_TIMEOUT}} (the socket read timeout) while the actual code references a 
variable called {{loginTimeout}}.) Socket reads can occur thousands of times in 
an application that does lots of Hive queries, and their individual workloads 
are each individually less predictable than simply getting a connection, which 
typically happens at most a few times. So you have a huge probability that a 
login timeout setting, which seems to usually receive a reasonable value of 30 
seconds if constrained at all, will occasionally (way too often) be inadequate 
for a socket read.
 # There seems to be no option to set this login timeout (or the actual read 
timeout) explicitly as an externalized override setting (but see HIVE-12371). 

*Summary:* {\{DriverManager.loginTimeout}} can be innocently set by any JDBC 
driver present in the JVM, you can't override it, and it's misused by Hive as a 
socket read timeout. There's no way to prevent intermittent read timeouts in 
this scenario unless you're lucky enough to find the JDBC driver and 
reconfigure its timeout setting to something workable for Hive socket reads.

An easy, crude patch:

modify the first line of {{HiveConnection.setupLoginTimeout()}} from:

{{long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());}}

to:

{{long timeOut = TimeUnit.SECONDS.toMillis(0);}}

This is of course not a robust fix, as server issues during socket reads can 
result in a hung client thread. Some other hardcoded value might be more 
advisable, as long as it's long enough to prevent spurious read timeouts.

The right approach is to prioritize HIVE-12371 (proposed socket timeout 
override setting that doesn't depend on {{DriverManager.loginTimeout}}) and 
implement it in all possible versions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to