Hi,
I am using hbase-0.90.1-cdh3u0.jar with hadoop-core-0.20.2-cdh3u0.jar. I
have written an MR which uses HFileOutputFormat for generating HFiles.
Below is the code in my driver class:
JobConf config = new JobConf(BulkImport.class);
Job job = new Job(config);
job.setJobName("Bulk load data into HBase");
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setMapperClass(BulkLoaderMapper.class);
job.setJarByClass(BulkImport.class);
//job.setInputFormatClass(TextInputFormat.class);
System.out.println("input path ::" +otherArgs[0]);
System.out.println("output path ::" +otherArgs[1]);
config.setNumReduceTasks(0);
FileInputFormat.setInputPaths(config, otherArgs[0]);
job.setOutputFormatClass(HFileOutputFormat.class);
Configuration hConfig = HBaseConfiguration.create(config);
hConfig.addResource(new
Path("/usr/lib/hbase/conf/hbase-site.xml"));
HFileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
//crucial step- TotalOrderPartitioner is configured now
HFileOutputFormat.configureIncrementalLoad(job, new HTable(hConfig,
"bulkLoadedTable"));
//job.waitForCompletion(true);
JobClient.runJob(config);
The problem is when I run this MR code, I get the below error, even though
my output path is properly set in the driver class.
11/12/29 05:18:04 INFO mapred.JobClient: map 0% reduce 0%
11/12/29 05:18:12 INFO mapred.JobClient: Task Id :
attempt_201111180914_54076_m_000000_0, Status : FAILED
java.io.IOException: Undefined job output-path
at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:232)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:687)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:384)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Just noticed that there are two versions of FileOutputFormat.java packaged
in hadoop core jar. In the above code, the
"HFileOutputFormat.setOutputPath(...)" call goes
to org.apache.hadoop.mapreduce.lib.output.FileOutputFormat whereas the
above error is coming from org.apache.hadoop.mapred.FileOutputFormat.
Is this a known bug in HBase 0.90.1 (the problem being that
HFileOutputFormat is extending the wrong FileOutputFormat, maybe??)
Thanks,
Ishan