Xuefu Zhang created HIVE-16854:
----------------------------------
Summary: SparkClientFactory is locked too aggressively
Key: HIVE-16854
URL: https://issues.apache.org/jira/browse/HIVE-16854
Project: Hive
Issue Type: Bug
Components: Spark
Affects Versions: 1.1.0
Reporter: Xuefu Zhang
Most methods in SparkClientFactory are synchronized on the SparkClientFactory
singleton. However, some methods are very expensive, such as createClient(),
which returns a SparkClientImpl instance. However, creating a SparkClientImpl
instance requires starting a remote driver to connect back to RPCServer. This
process can take a long time such as in case of a busy yarn queue. When this
happens, all pending calls on SparkClientFactory will have to wait for a long
time.
In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes
some queries waiting for hours before starting.
The current implementation seems pretty much making all remote driver launches
serialized. If one of them takes time, the following ones will have to wait.
HS2 stacktrace is attached for reference. It's based on earlier version of
Hive, so the line numbers might be slightly off.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)