Short answer: Yes. With Writable serialization, there's *some* support for collection structures in the form of MapWritable and ArrayWritable. You can make use of these classes.
However, I suggest using Apache Avro for these things, its much better to use its schema/reflect oriented serialization than using Writables. See http://avro.apache.org On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hello list, > > Is it possible to emit Java collections from a mapper?? > > My code looks like this - > public class UKOOAMapper extends Mapper<LongWritable, Text, > LongWritable, List<Text>> { > > public static Text CDPX = new Text(); > public static Text CDPY = new Text(); > public static List<Text> vals = new ArrayList<Text>(); > public static LongWritable count = new LongWritable(1); > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException { > String line = value.toString(); > if (line.startsWith("Q")) { > CDPX.set(line.substring(2, 13).trim()); > CDPY.set(line.substring(20, 25).trim()); > vals.add(CDPX); > vals.add(CDPY); > context.write(count, vals); > } > } > } > > And the driver class is - > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > > Path filePath = new > Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa"); > Configuration conf = new Configuration(); > Job job = new Job(conf, "SupportFileValidation"); > conf.set("mapreduce.output.key.field.separator", " > "); > job.setMapOutputValueClass(List.class); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setMapperClass(UKOOAMapper.class); > job.setReducerClass(ValidationReducer.class); > job.setInputFormatClass(TextInputFormat.class); > job.setOutputFormatClass(TextOutputFormat.class); > FileInputFormat.addInputPath(job, filePath); > FileOutputFormat.setOutputPath(job, new > Path("/mapout/"+filePath)); > job.waitForCompletion(true); > } > > When I am trying to execute the program, I am getting the following error - > 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes > where applicable > 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the > same. > 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1 > 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001 > 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0 > 12/07/10 16:41:46 INFO mapred.Task: Using ResourceCalculatorPlugin : > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45 > 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100 > 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720 > 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680 > 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.NullPointerException > at > org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0% > 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001 > 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0 > > Need some guidance from the experts. Please let me know where I am > going wrong. Many thanks. > > Regards, > Mohammad Tariq -- Harsh J