Tom Weber created SPARK-3769:
--------------------------------
Summary: SparkFiles.get gives me the wrong fully qualified path
Key: SPARK-3769
URL: https://issues.apache.org/jira/browse/SPARK-3769
Project: Spark
Issue Type: Bug
Components: Java API
Affects Versions: 1.1.0, 1.0.2
Environment: linux host, and linux grid.
Reporter: Tom Weber
Priority: Minor
My spark pgm running on my host, (submitting work to my grid).
JavaSparkContext sc =new JavaSparkContext(conf);
final String path = args[1];
sc.addFile(path); /* args[1] = /opt/tom/SparkFiles.sas */
The log shows:
14/10/02 16:07:14 INFO Utils: Copying /opt/tom/SparkFiles.sas to
/tmp/spark-4c661c3f-cb57-4c9f-a0e9-c2162a89db77/SparkFiles.sas
14/10/02 16:07:15 INFO SparkContext: Added file /opt/tom/SparkFiles.sas at
http://10.20.xx.xx:49587/files/SparkFiles.sas with timestamp 1412280434986
those are paths on my host machine. The location that this file gets on grid
nodes is:
/opt/tom/spark-1.1.0-bin-hadoop2.4/work/app-20141002160704-0002/1/SparkFiles.sas
While the call to get the path in my code that runs in my mapPartitions
function on the grid nodes is:
String pgm = SparkFiles.get(path);
And this returns the following string:
/opt/tom/spark-1.1.0-bin-hadoop2.4/work/app-20141002160704-0002/1/./opt/tom/SparkFiles.sas
So, am I expected to take the qualified path that was given to me and parse it
to get only the file name at the end, and then concatenate that to what I get
from the SparkFiles.getRootDirectory() call in order to get this to work?
Or pass only the parsed file name to the SparkFiles.get method? Seems as though
I should be able to pass the same file specification to both sc.addFile() and
SparkFiles.get() and get the correct location of the file.
Thanks,
Tom
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]