Hi, 

I am using the programmatic call to initialize the hadoop job.  (
"jobClient.submitJob( m_JobConf )")   I need to put a big object in 
distributed cache.  So I serialize it and send it over.  With the 
ToolRunner, I can use -file and the file has been sent over into the job 
directory and different jobs have no conflict.  However, there is no such 
thing in the programmatic submission. 

I originally just upload the file into hdfs and then add the hdfs address 
into distributed cache.  But to avoid the multiple job conflicts, I would 
like to add the jobID as a prefix or suffix to the remote name, however, I 
cannot access jobID until the submitJob() call which is too late for 
uploading files to HDFS. 

Alternatively, I read through the source code, I added the properties 
"tmpfiles" into jobConf object before the submitJob() call. 
 conf.set( "tmpfiles", output.makeQualified( localFs ).toUri() + "#" + 
symlink );
This seems the internal mechnism of the "-file" option.  But it feels very 
hacky.  It would be nice that Hadoop provides some more formal way to 
handle this.  Thanks. 

BTW: I am using 0.20.2 (CDH3u3)

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_...@freddiemac.com
Financial Engineering
Freddie Mac

Reply via email to