You might want to consider SparkContext.addFile() for distributing the file
at the client and SparkFiles.get() for retrieving the file at the execution
node.

--Xuefu

On Fri, Jan 2, 2015 at 7:15 PM, Zhang, Liyun <liyun.zh...@intel.com> wrote:

> Hi all,
>   I want to ask a question about "ship" in pig:
>     Ship with streaming, it will send streaming binary and supporting
> files, if any, from the client node to the compute nodes.
>   I found that the implementation of ship in Mapreduce mode is:
>
>
> /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
> line 721:
> setupDistributedCache(pigContext, conf, pigContext.getProperties(),
>                     "pig.streaming.ship.files", true);
>
> this function gets all "pig.streaming.ship.files" from the properties,
> then copy the ship files to hadoop using fs.copyFromLocalFile, at the same
> time, symlink feature is turned on by using
> DistributedCache.createSymlink(conf). For example, if ship file "/tmp/
> teststreaming.pl" is copyed from local to hadoop, the hadoop file will be
> hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl.
> /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for
> hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl . teststreaming.pl
> will be generated as a link to
> /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current
> execution path.  If i want to implement ship in other mode like spark, the
> only thing i need to do is copying the shiped files from the shiped path to
> current execution path?
>
>
>
> Best regards
> Zhang,Liyun
>
>

Reply via email to