Re: Iterative MR issue

Arko Provo Mukherjee Wed, 12 Oct 2011 01:00:46 -0700

Hi,

I solved it by creating a new JobConf instance for each iteration in the loop.


Thanks & regards
Arko

On Oct 12, 2011, at 1:54 AM, Arko Provo Mukherjee 
<arkoprovomukher...@gmail.com> wrote:

> Hello Everyone,
> 
> I have a particular situation, where I am trying to run Iterative Map-Reduce, 
> where the output files for one iteration are the input files for the next. 
> It stops when there are no new files created in the output.
> 
> Code Snippet:
> 
> int round = 0;
> JobConf jobconf = new JobConf(new Configuration(), MyClass.class);
> 
> do  {
> 
> String old_path = "path_" + Integer.toString(round);
> 
> round = round + 1;
> 
> String new_path = "path" + Integer.toString(round);
> 
> FileInputFormat.addInputPath ( jobconf, new Path (old_file) );  
> 
> FileInputFormat.setInputPath ( jobconf, new Path (new_file) );   // These 
> will eventually become directories containing multiple files
> 
> jobconf.setMapperClass(MyMap.class);
> 
> jobconf.setReducerClass(MyReduce.class);
> 
> // Other code
> 
> JobClient.runJob(jobconf);
> 
> FileStatus[] directory = fs.listStatus ( new Path ( new_file ) );  // To 
> check for any new files in the output directory
> 
> } while ( directory.length != 0 );  // Stop iteration only when no new files 
> are generated in the output path
> 
> 
> 
> The code runs smoothly in the first round and I can see the new directory 
> path_1 getting created and files added in it from the Reducer output. 
> 
> The original path_0 is created from before by me and I have added relevant 
> files in it. 
> 
> The output files seems to have the correct data as per my Map/Reduce logic.
> 
> However, in the second round it fails with the following exception.
> 
> In 0.19 (In a cloud system - Fully Distributed Mode)
> 
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cloud_hostname:9000/hadoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar,
>  expected: file:///
> 
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322)
> 
> 
> 
> In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode)
> 
> 11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area 
> hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0002
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0001/job.jar,
>  expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354)
> 
> It seems that Hadoop is not being able to delete the staging file for the job.
> 
> Can you please suggest any reason for this? Please help!
> 
> Thanks a lot in advance!
> 
> Warm regards
> Arko
> 
> 
> 
> 
>

Re: Iterative MR issue

Reply via email to