Hello list, Is it possible to emit Java collections from a mapper??
My code looks like this - public class UKOOAMapper extends Mapper<LongWritable, Text, LongWritable, List<Text>> { public static Text CDPX = new Text(); public static Text CDPY = new Text(); public static List<Text> vals = new ArrayList<Text>(); public static LongWritable count = new LongWritable(1); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); if (line.startsWith("Q")) { CDPX.set(line.substring(2, 13).trim()); CDPY.set(line.substring(20, 25).trim()); vals.add(CDPX); vals.add(CDPY); context.write(count, vals); } } } And the driver class is - public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa"); Configuration conf = new Configuration(); Job job = new Job(conf, "SupportFileValidation"); conf.set("mapreduce.output.key.field.separator", " "); job.setMapOutputValueClass(List.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); job.setMapperClass(UKOOAMapper.class); job.setReducerClass(ValidationReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, filePath); FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath)); job.waitForCompletion(true); } When I am trying to execute the program, I am getting the following error - 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0 12/07/10 16:41:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001 java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0% 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0 Need some guidance from the experts. Please let me know where I am going wrong. Many thanks. Regards, Mohammad Tariq