[
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127747#comment-15127747
]
Rui Li commented on HIVE-12650:
-------------------------------
Hi [~xuefuz], the exception you posted doesn't seem to be a timeout, at least
it's not related to {{hive.spark.client.server.connect.timeout}}, because the
elapsed time is much less than 90s. I found the code that prints the log you
mentioned:
{code}
while (sparkContextRef.get() == null && System.currentTimeMillis <
deadline && !finished) {
logInfo("Waiting for spark context initialization ... ")
sparkContextRef.wait(10000L)
}
val sparkContext = sparkContextRef.get()
if (sparkContext == null) {
logError(("SparkContext did not initialize after waiting for %d ms.
Please check earlier"
+ " log output for errors. Failing the
application.").format(totalWaitTime))
}
{code}
You can see the while loop can exit either on timeout or finished being set to
true. Since time elapsed is short, it must because user thread (RemoteDriver)
has finished abnormally:
{code}
val userThread = new Thread {
override def run() {
try {
mainMethod.invoke(null, userArgs.toArray)
finish(FinalApplicationStatus.SUCCEEDED,
ApplicationMaster.EXIT_SUCCESS)
logDebug("Done running users class")
} catch {
case e: InvocationTargetException =>
e.getCause match {
case _: InterruptedException =>
// Reporter thread can interrupt to stop user class
case SparkUserAppException(exitCode) =>
val msg = s"User application exited with status $exitCode"
logError(msg)
finish(FinalApplicationStatus.FAILED, exitCode, msg)
case cause: Throwable =>
logError("User class threw exception: " + cause, cause)
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_EXCEPTION_USER_CLASS,
"User class threw exception: " + cause)
}
}
}
}
{code}
In conclusion, the problem here is not we timed out creating SparkContext. My
guess is that something goes wrong before we create SparkContext (you can refer
to the constructor of RemoteDriver). Also found another property
{{hive.spark.client.connect.timeout}} which defaults to 1000ms. It's used when
RemoteDriver creates RPC client so it could be related, although I'm a little
confused about the difference between the 2 configurations.
Regarding your last question, I tried submitting application when no container
is available. Spark-submit will wait until timeout (90s).
> Increase default value of hive.spark.client.server.connect.timeout to exceeds
> spark.yarn.am.waitTime
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-12650
> URL: https://issues.apache.org/jira/browse/HIVE-12650
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.1, 1.2.1
> Reporter: JoneZhang
> Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than
> spark.yarn.am.waitTime. The default value for
> spark.yarn.am.waitTime is 100s, and the default value for
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can
> increase it to a larger value such as 120s.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)