You might want to consider SparkContext.addFile() for distributing the file at the client and SparkFiles.get() for retrieving the file at the execution node.
--Xuefu On Fri, Jan 2, 2015 at 7:15 PM, Zhang, Liyun <liyun.zh...@intel.com> wrote: > Hi all, > I want to ask a question about "ship" in pig: > Ship with streaming, it will send streaming binary and supporting > files, if any, from the client node to the compute nodes. > I found that the implementation of ship in Mapreduce mode is: > > > /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > line 721: > setupDistributedCache(pigContext, conf, pigContext.getProperties(), > "pig.streaming.ship.files", true); > > this function gets all "pig.streaming.ship.files" from the properties, > then copy the ship files to hadoop using fs.copyFromLocalFile, at the same > time, symlink feature is turned on by using > DistributedCache.createSymlink(conf). For example, if ship file "/tmp/ > teststreaming.pl" is copyed from local to hadoop, the hadoop file will be > hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl. > /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for > hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl . teststreaming.pl > will be generated as a link to > /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current > execution path. If i want to implement ship in other mode like spark, the > only thing i need to do is copying the shiped files from the shiped path to > current execution path? > > > > Best regards > Zhang,Liyun > >