[
https://issues.apache.org/jira/browse/SPARK-34955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-34955:
------------------------------------
Assignee: Apache Spark (was: Kousuke Saruta)
> ADD JAR command cannot add jar files which contains whitespaces in the path
> ---------------------------------------------------------------------------
>
> Key: SPARK-34955
> URL: https://issues.apache.org/jira/browse/SPARK-34955
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
> Reporter: Kousuke Saruta
> Assignee: Apache Spark
> Priority: Major
>
> ADD JAR command cannot add jar files which contains white spaces in the path.
> If we have `/some/path/test file.jar` and execute the following command:
> {code}
> ADD JAR "/some/path/test file.jar";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:38 ERROR SparkSQLDriver: Failed in [add jar "/some/path/test
> file.jar"]
> java.lang.IllegalArgumentException: Illegal character in path at index 9:
> /some/path/test file.jar
> at java.net.URI.create(URI.java:852)
> at
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129)
> at
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:34)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> {code}
> This is because `HiveSessionStateBuilder` and `SessionStateBuilder` don't
> check whether the form of the path is URI or plain path and it always regards
> the path as URI form.
> Whitespces should be encoded to `%20` so `/some/path/test file.jar` is
> rejected.
> We can resolve this part by checking whether the given path is URI form or
> not.
> Unfortunatelly, if we fix this part, another problem occurs.
> When we execute `ADD JAR` command, Hive's `ADD JAR` command is executed in
> `HiveClientImpl.addJar` and `AddResourceProcessor.run` is transitively
> invoked.
> In `AddResourceProcessor.run`, the command line is just split by `\\s+` and
> the path is also split into `/some/path/test` and `file.jar` and passed to
> `ss.add_resources`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java#L56-L75
> So, the command still fails.
> Even if we convert the form of the path to URI like
> `file:/some/path/test%20file.jar` and execute the following command:
> {code}
> ADD JAR "file:/some/path/test%20file";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:53 ERROR SessionState: file:/some/path/test%20file.jar does
> not exist
> java.lang.IllegalArgumentException: file:/some/path/test%20file.jar does not
> exist
> at
> org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:1168)
> at
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1289)
> at
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1278)
> at
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1378)
> at
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1336)
> at
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:74)
> {code}
> The reason is `Utilities.realFile` invoked in `SessionState.validateFiles`
> returns `null` as the result of `fs.exists(path)` is `false`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1052-L1064
> `fs.exists` checks the existence of the given path by comparing the string
> representation of Hadoop's `Path`.
> The string representation of `Path` is similar to URI but it's actually
> different.
> `Path` doesn't encode the given path.
> For example, the URI form of `/some/path/jar file.jar` is
> `file:/some/path/jar%20file.jar` but the `Path` form of it is
> `file:/some/path/jar file.jar`. So `fs.exists` returns false.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]