Hugh Zabriskie created SPARK-23015:
--------------------------------------
Summary: spark-submit fails when submitting several jobs in
parallel
Key: SPARK-23015
URL: https://issues.apache.org/jira/browse/SPARK-23015
Project: Spark
Issue Type: Bug
Components: Spark Submit
Affects Versions: 2.2.1, 2.2.0, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0,
1.4.0
Environment: Windows 10 (1709/16299.125)
Spark 2.3.0
Java 8, Update 151
Reporter: Hugh Zabriskie
Spark Submit's launching library prints the command to execute the launcher
(org.apache.spark.launcher.main) to a temporary text file, reads the result
back into a variable, and then executes that command.
[bin/spark-class2.cmd,
L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L67]
That temporary text file is given a pseudo-random name by the %RANDOM% env
variable generator, which generates a number between 0 and 32767.
This appears to be the cause of an error occurring when several spark-submit
jobs are launched simultaneously. The following error is returned from stderr:
"The process cannot access the file because it is being used by another
process.The system cannot find the file
{USER}\AppData\Local\Temp\spark-class-launcher-output-{RANDOM's int}.txt.The
process cannot access the file because it is being used by another process."
My hypothesis is that %RANDOM% is returning the same value for multiple jobs,
causing the launcher library to attempt to write to the same file from multiple
processes. Another mechanism is needed for reliably generating the names of the
temporary files so that the concurrency issue is resolved.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]