Kousuke Saruta created SPARK-34955:
--------------------------------------

             Summary: ADD JAR command cannot add jar files which contains 
whitespaces in the path
                 Key: SPARK-34955
                 URL: https://issues.apache.org/jira/browse/SPARK-34955
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.1.1, 3.0.2, 2.4.7, 3.2.0
            Reporter: Kousuke Saruta
            Assignee: Kousuke Saruta


ADD JAR command cannot add jar files which contains white spaces in the path.

If we have `/some/path/test file.jar` and execute the following command:

{code}
ADD JAR "/some/path/test file.jar";
{code}
The following exception is thrown.
{code}
21/04/05 10:40:38 ERROR SparkSQLDriver: Failed in [add jar "/some/path/test 
file.jar"]
java.lang.IllegalArgumentException: Illegal character in path at index 9: 
/some/path/test file.jar
        at java.net.URI.create(URI.java:852)
        at 
org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129)
        at 
org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:34)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
{code}

This is because `HiveSessionStateBuilder` and `SessionStateBuilder` don't check 
whether the form of the path is URI or plain path and it always regards the 
path as URI form.
Whitespces should be encoded to `%20` so `/some/path/test file.jar` is rejected.
We can resolve this part by checking whether the given path is URI form or not.

Unfortunatelly, if we fix this part, another problem occurs.
When we execute `ADD JAR` command, Hive's `ADD JAR` command is executed in 
`HiveClientImpl.addJar` and `AddResourceProcessor.run` is transitively invoked.
In `AddResourceProcessor.run`, the command line is just split by `\\s+` and the 
path is also split into `/some/path/test` and `file.jar` and passed to 
`ss.add_resources`.
https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java#L56-L75
So, the command still fails.

Even if we convert the form of the path to URI like 
`file:/some/path/test%20file.jar` and execute the following command:
{code}
ADD JAR "file:/some/path/test%20file";
{code}
The following exception is thrown.
{code}
21/04/05 10:40:53 ERROR SessionState: file:/some/path/test%20file.jar does not 
exist
java.lang.IllegalArgumentException: file:/some/path/test%20file.jar does not 
exist
        at 
org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:1168)
        at 
org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1289)
        at 
org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1278)
        at 
org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1378)
        at 
org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1336)
        at 
org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:74)
{code}
The reason is `Utilities.realFile` invoked in `SessionState.validateFiles` 
returns `null` as the result of `fs.exists(path)` is `false`.
https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1052-L1064

`fs.exists` checks the existence of the given path by comparing the string 
representation of Hadoop's `Path`.
The string representation of `Path` is similar to URI but it's actually 
different.
`Path` doesn't encode the given path.
For example, the URI form of `/some/path/jar file.jar` is 
`file:/some/path/jar%20file.jar` but the `Path` form of it is 
`file:/some/path/jar file.jar`. So `fs.exists` returns false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to