Hi all, I am trying to sample the key distribution before making a total sort. But the programs failed and throw an exception. This is the stack:
Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.lib.partition.InputSampler$RandomSampler.getSample(InputSampler.java:220) at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:315) at Sorter.run(Sorter.java:100) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at Sorter.main(Sorter.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) I check the code in LineRecordReader.java. And find that the exception is caused by this line: newSize = in.readLine(value, maxLineLength,Math.max(maxBytesToConsume(pos), maxLineLength)); "in" is a null pointer. I specify the input format as "TextInputFormat". It looks like TextInputFormat fails to read the data. Any ideas on how to fix this? Thanks I am under hadoop 0.21.0 and my job set up is: ...... job.setInputFormatClass(TextInputFormat.class); job.setPartitionerClass(TotalOrderPartitioner.class); InputSampler.Sampler<LongWritable, Text> sampler = new InputSampler.RandomSampler<LongWritable, Text>(0.1, 10000, 10); Path input = FileInputFormat.getInputPaths(job)[0]; input = input.makeQualified(input.getFileSystem(conf)); Path partitionFile = new Path(input, "_partitions"); TotalOrderPartitioner.setPartitionFile(conf, partitionFile); InputSampler.writePartitionFile(job, sampler); URI partitionUri = new URI(partitionFile.toString() + "#_partitions"); DistributedCache.addCacheFile(partitionUri, conf); DistributedCache.createSymlink(conf); ......