[
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936313#comment-14936313
]
Saisai Shao edited comment on SPARK-10858 at 9/30/15 3:50 AM:
--------------------------------------------------------------
Hi [~tgraves], I tested again with Mac and Linux (centos).
if we use {{--jars my.jar#renamed.jar}}
this file path will be resolved to URI
{{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}}
if we use {{--jars
file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}}
this file path will be resolved to URI
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}}
This is done by Utils#resolveURI
{code}
def resolveURI(path: String): URI = {
try {
val uri = new URI(path)
if (uri.getScheme() != null) {
return uri
}
} catch {
case e: URISyntaxException =>
}
new File(path).getAbsoluteFile().toURI()
}
{code}
Where if scheme is not specified, this code will transform the file path into
URI, the noted thing is that "#" will be translated into "%23" in this `toURI`.
After digging into the Hadoop code RawLocalFileSystem#pathToFile:
{code}
public File pathToFile(Path path) {
checkPath(path);
if (!path.isAbsolute()) {
path = new Path(getWorkingDirectory(), path);
}
return new File(path.toUri().getPath());
}
{code}
Here using `URI.getPath` to get file path will lead to different behavior if we
do not escape "#" to "%23", which will treat the part after "#" as fragment,
not path.
But if we instead using
{{--jars my.jar%23renamed.jar}}
or
{{--jars file:///path/to/my.jar%23renamed.jar}},
it can be succeeded in both way.
was (Author: jerryshao):
Hi [~tgraves], I tested again with Mac and Linux (centos), seems the behavior
is different.
In Mac,
if we use {{--jars my.jar#renamed.jar}}
this file path will be resolved to URI
{{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}}
if we use {{--jars
file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}}
this file path will be resolved to URI
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}}
This is done by Utils#resolveURI
{code}
def resolveURI(path: String): URI = {
try {
val uri = new URI(path)
if (uri.getScheme() != null) {
return uri
}
} catch {
case e: URISyntaxException =>
}
new File(path).getAbsoluteFile().toURI()
}
{code}
Where if scheme is not specified, this code will transform the file path into
URI, the noted thing is that "#" will be translated into "%23" in this `toURI`.
In Centos:
both
{{--jars my.jar#renamed.jar}}
and
{{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}}
will be resolved to
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} through
Utils#resolveURI, obviously "#" is not escaped.
So in my test, both these two ways of using --jars are failed in Centos.
After digging into the Hadoop code RawLocalFileSystem#pathToFile:
{code}
public File pathToFile(Path path) {
checkPath(path);
if (!path.isAbsolute()) {
path = new Path(getWorkingDirectory(), path);
}
return new File(path.toUri().getPath());
}
{code}
Here using `URI.getPath` to get file path will lead to different behavior if we
do not escape "#" to "%23", which will treat the part after "#" as fragment,
not path. So in Mac without specifying scheme is succeeded, whereas in Centos
both two ways are failed.
But if we instead using
{{--jars my.jar%23renamed.jar}}
or
{{--jars file:///path/to/my.jar%23renamed.jar}},
it can be succeeded in Centos.
> YARN: archives/jar/files rename with # doesn't work unless scheme given
> -----------------------------------------------------------------------
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.5.1
> Reporter: Thomas Graves
> Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you
> can rename the file/archive using a # symbol only works if you explicitly
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File
> file:/home/foo/my.jar#renamed.jar does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]