Srinivas created HADOOP-15898: --------------------------------- Summary: WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.io.IOException: All datanodes DatanodeInfoWithStorage [[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK] are bad. Aborting... Key: HADOOP-15898 URL: https://issues.apache.org/jira/browse/HADOOP-15898 Project: Hadoop Common Issue Type: Improvement Components: performance Affects Versions: 2.6.0 Environment: Hadoop 2.6.0-cdh5.5.1
Reporter: Srinivas Fix For: 2.6.0 There is a business impact MR job which runs every day @ 2.00 PM PST and data size is about 1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs. But the multiple mappers of this job simultaneously failing with the following error so job will take some times 11 and even 13 hours also like that. Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the ulimit 3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly But no luck. WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789 block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in pipeline DatanodeInfoWithStorage [10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK], DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK], DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:( bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK] WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789 block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK], DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]: bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK] WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK] are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org