[
https://issues.apache.org/jira/browse/HADOOP-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Srinivas updated HADOOP-15898:
------------------------------
Summary: 1 TB Data size fails to run with the following error (was: 1 TB
TeraGen fails to run with the following error )
> 1 TB Data size fails to run with the following error
> -----------------------------------------------------
>
> Key: HADOOP-15898
> URL: https://issues.apache.org/jira/browse/HADOOP-15898
> Project: Hadoop Common
> Issue Type: Improvement
> Components: performance
> Affects Versions: 2.6.0
> Environment: Hadoop 2.6.0-cdh5.5.1
>
>
> Reporter: Srinivas
> Priority: Major
> Labels: performance
> Fix For: 2.6.0
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> There is a business impact MR job which runs every day @ 2.00 PM PST and data
> size is about 1 - 1.5 TB (depends on the business days) . Ideal elapsed time
> of this job : 4 hrs. But the multiple mappers of this job simultaneously
> failing with the following error so job will take some times 11 and even 13
> hours also like that.
> Steps to prevent this problem : 1, Migrated the environment to Yarn .2
> increased the ulimit 3. Added extra nodes to the cluster. 4. Disks
> replacement taking place regularly But no luck.
> WARN [DataStreamer for file
> /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
> block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]
> org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
> BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in
> pipeline DatanodeInfoWithStorage
> [10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],
>
> DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(
> bad datanode
> DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]
>
> WARN [DataStreamer for file
> /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
> block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]
> org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
> BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in
> pipeline
> DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
>
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:
> bad datanode
> DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]
>
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child :
> java.io.IOException: java.io.IOException: All datanodes
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]
> are bad. Aborting... at
> com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]