Olivier Sannier created SPARK-23320:

             Summary: RANDOM pseudo environment variable has low resolution 
under Windows
                 Key: SPARK-23320
                 URL: https://issues.apache.org/jira/browse/SPARK-23320
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 2.1.1
         Environment: Windows 7, Windows 10

Spark 2.1.1
            Reporter: Olivier Sannier

Under Windows, spark-submit.bat calls spark-class2.cmd which then runs 
org.apache.spark.launcher.Main to retrieve its output and place it into the 
SPARK_CMD variable.

To do so, it uses a redirection to a temporary file whose name is created with 
this command:

{{set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt}}

Note how it uses the %RANDOM% variable to get a hopefully unique name for the 
file that will get created.

There are two issues with this however:
 # if we have bad luck, we can get the same value RANDOM
 # bad luck is quite easy to get if we submit numerous jobs at once, because 
the granularity of RANDOM is based on the current second, as indicated 

When two concurrent spark-submit calls use the same generated file, one of them 
fails with a "file is in use" error message.

I would thus suggest replacing the above lines with these ones:

{{rem Because random is based on the current time, second based, it may collide 
quite easily.}}
{{rem To avoid this, we retry until a non existent file name is found and we 
create that file as soon as possible}}
{{set laucher_try_count=5}}
{{set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt}}
{{if not exist "%LAUNCHER_OUTPUT%" goto :launcher_success}}
{{set /A laucher_try_count -= 1}}
{{if %laucher_try_count% GTR 0 goto :retry_launcher}}
{{echo Could not generate a launcher output filename that does not already 
exists 1>&2}}
{{goto :eof}}
{{echo rem > %LAUNCHER_OUTPUT%}}

This code has been tried here and effectively solves the issue.


This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to