Failed on write sequence files in mapper.
-----------------------------------------

                 Key: MAPREDUCE-1269
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1269
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.1
         Environment: Hadoop 0.20.1
Compiled by oom on Tue Sep  1 20:55:56 UTC 2009

Linux version 2.6.18-128.el5 ([email protected]) (gcc version 
4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Jan 21 10:41:14 EST 2009

            Reporter: YangLai
            Priority: Critical


Because the sort phase is not necessary for my job, I want to write only values 
into sequence files by keys. So I set a hashmap into mapper: 

        private HashMap<String, Writer> hm;

and I find a suitable org.apache.hadoop.io.SequenceFile.Writer by HashMap: 

                Writer seqWriter = hm.get(skey);
                if (seqWriter==null){
                        try {
                                seqWriter = new SequenceFile.Writer(new 
JobClient(job).getFs()
                                                , job, new Path(pPathOut, 
skey), VLongWritable.class, ByteWritable.class);
                        } catch (IOException e) {
                                e.printStackTrace();
                        }
                        if (seqWriter!=null){
                                hm.put(skey, seqWriter);
                        }else{
                                return;
                        }
                }

The file names are obtained from job.get("mapred.task.id"), that insure no 
replicas exist.
The system always outputs : 

java.io.IOException: Could not obtain block: blk_-5398274085876111743_1021 
file=/YangLai/ranNum1GB/part-00015
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1787)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1615)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1742)
        at java.io.DataInputStream.readFully(Unknown Source)
        at java.io.DataInputStream.readFully(Unknown Source)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
        at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

In fact, each mapper only write 16 sequence files, that will not be overloads 
to the hadoop system. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to