Hi all,
  I want to ask a question about "ship" in pig:
    Ship with streaming, it will send streaming binary and supporting files, if 
any, from the client node to the compute nodes.
  I found that the implementation of ship in Mapreduce mode is:

/home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
line 721:
setupDistributedCache(pigContext, conf, pigContext.getProperties(),
                    "pig.streaming.ship.files", true);

this function gets all "pig.streaming.ship.files" from the properties, then 
copy the ship files to hadoop using fs.copyFromLocalFile, at the same time, 
symlink feature is turned on by using DistributedCache.createSymlink(conf). For 
example, if ship file "/tmp/teststreaming.pl" is copyed from local to hadoop, 
the hadoop file will be hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl. 
/tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for 
hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl . teststreaming.pl will 
be generated as a link to  
/tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current 
execution path.  If i want to implement ship in other mode like spark, the only 
thing i need to do is copying the shiped files from the shiped path to current 
execution path?



Best regards
Zhang,Liyun

Reply via email to