Hi Liu, You have a few choices, you either a) use no OutputFormat at all or b) create your own custom one that handles what you need. I have MapReduce jobs that scan a HBase table and compute a specific value that I then store in memcached. For that I do that directly in a custom TableMapper and set the output format to
job.setOutputFormatClass(NullOutputFormat.class); I often also set the number of reducers to 0 as I can do all the work in the Mapper. This is because row keys are sorted and unique, so there is no need to have a Reducer as there is nothing to reduce. So I do job.setNumReduceTasks(0); The new Hadoop MapReduce API has removed the ability to set the number of map tasks. This was always just a hint to the framework anyways and was not a hard limit. The number of Mappers is linked to the InputFormat that is used as it is responsible to split the input data into equal chunks for processing. Our TableInputFormat for example splits the tables at region boundaries. A FileInputFormat may split text files into equal blocks matching the Hadoop block size while specifying one of the data nodes having a copy of it. That way the data can be processed local. But if the input file is a compressed, non-splittable format such as GZip then a single Mapper is handling the whole file. Even if you would have specified 10 map tasks it would only use one as it has no other choice. Lars Liu Xianglong schrieb: > Hi, everyone. Is there someone who uses map-reduce to store the reduce output > in memory. I mean, now the output path of job is set and reduce outputs are > stored into files under this path.(see the comments along with the following > codes) > job.setOutputFormatClass(MyOutputFormat.class); > //can I implement my OutputFormat to store these output key-value pairs > in my data structures, or are these other ways to do it? > job.setOutputKeyClass(ImmutableBytesWritable.class); > job.setOutputValueClass(Result.class); > FileOutputFormat.setOutputPath(job, outputDir); > > Is there any way to store them in some variables or data structures? Then > how can I implement my OutputFormat? Any suggestions and codes are welcomed. > > Another question: is there some way to set the number of map task? It seems > there is no API to do this in hadoop new job APIs. I am not sure the way to > set this number. > > Thanks! > > Best Wishes! > _____________________________________________________________ > > 刘祥龙 Liu Xianglong > >
