Hi, I am using the programmatic call to initialize the hadoop job. ( "jobClient.submitJob( m_JobConf )") I need to put a big object in distributed cache. So I serialize it and send it over. With the ToolRunner, I can use -file and the file has been sent over into the job directory and different jobs have no conflict. However, there is no such thing in the programmatic submission.
I originally just upload the file into hdfs and then add the hdfs address into distributed cache. But to avoid the multiple job conflicts, I would like to add the jobID as a prefix or suffix to the remote name, however, I cannot access jobID until the submitJob() call which is too late for uploading files to HDFS. Alternatively, I read through the source code, I added the properties "tmpfiles" into jobConf object before the submitJob() call. conf.set( "tmpfiles", output.makeQualified( localFs ).toUri() + "#" + symlink ); This seems the internal mechnism of the "-file" option. But it feels very hacky. It would be nice that Hadoop provides some more formal way to handle this. Thanks. BTW: I am using 0.20.2 (CDH3u3) Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac