ok great, thanks Tom for replying, I'm still relatively new to Hadoop
so wasn't sure if I had missed something.
On 09/11/2009, at 2:41 PM, Tom White wrote:
Multiple outputs has been ported to the new API in 0.21. See
https://issues.apache.org/jira/browse/MAPREDUCE-370.
Cheers,
Tom
On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策)
<[email protected]> wrote:
I just fall back to old mapred.* APIs, seems MultipleOutputs only
works for
the old API.
wishes,
Xiance
On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith <[email protected]> wrote:
Totally stuck here, I can't seem to find a way to resolve this,
but I can't
use the new API _and_ use the MultipleOutputFormats class.
I found this thread which is related, but doesn't seem to help me
(or I
missed something completely, certainly possible):
http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results
My controller Job class is simple, but I get a compile error
trying to add
the new MultipleOutputs:
public class ControllerMetricGrinder {
public static class MetricNameMultipleTextOutputFormat extends
MultipleTextOutputFormat<String, ControllerMetric> {
@Override
protected String generateFileNameForKeyValue(String key,
ControllerMetric value, String name) {
return key;
}
}
public static void main(String[] args) throws Exception {
Job job = new Job();
job.setJarByClass(ControllerMetricGrinder.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ControllerMetric.class);
job.setMapperClass(ControllerMetricMapper.class);
job.setCombinerClass(ControllerMetricReducer.class);
job.setReducerClass(ControllerMetricReducer.class);
// COMPILE ERROR HERE
MultipleOutputs.addMultiNamedOutput(job, "metrics",
MetricNameMultipleTextOutputFormat.class,
Text.class, ControllerMetric.class);
job.setNumReduceTasks(5);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
(mappers and reducers are using the new API, and are in separate
classes).
MultipleOutputs doesn't take a Job, it only takes a JobConf. Any
ideas?
I'd prefer to use the new API (because I've written it that way),
but I'm
guessing now I'll have to go and rework everything to the OLD API
to get
this to work.
I'm trying to create a File-per-metric name (there's only 5).
thoughts?
Paul