Re: A couple of usability problems

Owen O'Malley Tue, 25 Sep 2007 23:01:44 -0700

On Sep 25, 2007, at 10:30 AM, Nathan Wang wrote:

1) Adjusting input set dynamically
At the start, I had 9090 gzipped input data files for the job,
07/09/24 10:26:06 INFO mapred.FileInputFormat: Total inputpaths to process : 9090
Then I realized there were 3 files that were bad (couldn't begunzipped).
So, I removed them by doing,
    bin/hadoop  dfs  -rm  srcdir/FILExxx.gz
20 hours later, the job was failed. And, I found a few errors inthe log:org.apache.hadoop.ipc.RemoteException: java.io.IOException:Cannot open filename ...FILExxx.gz
Is it possible that the runtime could adjust the input data setaccordingly?

As Devaraj pointed out this is possible, but in general I think it iscorrect to make this an error. The planning for the job must happenat the beginning before the job is launched and once the map has beenassigned a file, if the mapper can't read the assigned input, it is afatal problem. If failures are tolerable for your application, youcan set the percent of mappers and reducers that can fail before thejob is killed.

Can we check the existence of the output directory at the verybeginning, to save us a day?

It does already. That was done back before 0.1 in HADOOP-3. Was yourprogram launching two jobs or something? Very strange.


-- Owen

Re: A couple of usability problems

Reply via email to