Phil Walker created SPARK-40738:
-----------------------------------

             Summary: spark-shell fails with "bad array subscript" in cygwin or 
msys bash session
                 Key: SPARK-40738
                 URL: https://issues.apache.org/jira/browse/SPARK-40738
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell, Windows
    Affects Versions: 3.3.0
         Environment: The problem occurs in Windows if *_spark-shell_* is 
called from a bash session.

NOTE: the fix also applies to _*spark-submit*_ and and {_}*beeline*{_}, since 
they call spark-shell.
            Reporter: Phil Walker


A spark pull request [spark PR|https://github.com/apache/spark/pull/38167] 
fixes this issue, and also fixes a build error that is also related to 
_*cygwin*_  and *msys/mingw* bash *sbt* sessions.

If a Windows user tries to start a *_spark-shell_* session by calling the bash 
script (rather than the *_spark-shell.cmd_* script), it fails with a confusing 
error message.  Script _*spark-class*_ calls 
_*launcher/src/main/java/org/apache/spark/launcher/Main.java* to_ generate 
command line arguments, but the launcher produces a format appropriate to the 
*_.cmd_* version of the script rather than the _*bash*_ version.

The launcher Main method, when called for environments other than Windows, 
interleaves NULL characters between the command line arguments.   It should 
also do so in Windows when called from the bash script.  It incorrectly assumes 
that if the OS is Windows, that it is being called by the .cmd version of the 
script.

The resulting error message is unhelpful:

 
{code:java}
[lots of ugly stuff omitted]
/opt/spark/bin/spark-class: line 100: CMD: bad array subscript
{code}
The key to _*launcher/Main*_ knowing that a request is from a _*bash*_ session 
is that the _*SHELL*_ environment variable is set.   This will normally be set 
in any of the various Windows shell environments ({_}*cygwin*{_}, 
{_}*mingw64*{_}, {_}*msys2*{_}, etc) and will not normally be set in Windows 
environments.   In the _*spark-class.cmd*_ script, _*SHELL*_ is intentionally 
unset to avoid problems, and to permit bash users to call the _*.cmd*_ scripts 
if they prefer (it will still work as before).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to