Failed on write sequence files in mapper.
-----------------------------------------
Key: MAPREDUCE-1269
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1269
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.1
Environment: Hadoop 0.20.1
Compiled by oom on Tue Sep 1 20:55:56 UTC 2009
Linux version 2.6.18-128.el5 ([email protected]) (gcc version
4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 10:41:14 EST 2009
Reporter: YangLai
Priority: Critical
Because the sort phase is not necessary for my job, I want to write only values
into sequence files by keys. So I set a hashmap into mapper:
private HashMap<String, Writer> hm;
and I find a suitable org.apache.hadoop.io.SequenceFile.Writer by HashMap:
Writer seqWriter = hm.get(skey);
if (seqWriter==null){
try {
seqWriter = new SequenceFile.Writer(new
JobClient(job).getFs()
, job, new Path(pPathOut,
skey), VLongWritable.class, ByteWritable.class);
} catch (IOException e) {
e.printStackTrace();
}
if (seqWriter!=null){
hm.put(skey, seqWriter);
}else{
return;
}
}
The file names are obtained from job.get("mapred.task.id"), that insure no
replicas exist.
The system always outputs :
java.io.IOException: Could not obtain block: blk_-5398274085876111743_1021
file=/YangLai/ranNum1GB/part-00015
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1787)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1615)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1742)
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readFully(Unknown Source)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
In fact, each mapper only write 16 sequence files, that will not be overloads
to the hadoop system.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.