[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanisha Koneru updated HDDS-609: -------------------------------- Summary: SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state (was: Mapreduce example fails with Allocate block failed, error:INTERNAL_ERROR) > SCM does not exit chill mode as it expects DNs to report containers in > ALLOCATED state > -------------------------------------------------------------------------------------- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Namit Maheshwari > Priority: Major > > {code:java} > -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar > wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 > 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History > server at ctr-e138-1518143905142-510793-01-000004.hwx.site/172.27.79.197:10200 > 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure > Coding for path: /user/hdfs/.staging/job_1539125785626_0007 > 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 > 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1539125785626_0007 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml > at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application > application_1539125785626_0007 > 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: > http://ctr-e138-1518143905142-510793-01-000005.hwx.site:8088/proxy/application_1539125785626_0007/ > 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 > 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in > uber mode : false > 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% > 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% > 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : > attempt_1539125785626_0007_r_000000_0, Status : FAILED > Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 18/10/09 23:37:35 INFO mapreduce.Job: Task Id : > attempt_1539125785626_0007_r_000000_1, Status : FAILED > Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 18/10/09 23:37:42 INFO mapreduce.Job: Task Id : > attempt_1539125785626_0007_r_000000_2, Status : FAILED > Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 18/10/09 23:37:49 INFO mapreduce.Job: map 100% reduce 100% > 18/10/09 23:37:50 INFO mapreduce.Job: Job job_1539125785626_0007 failed with > state FAILED due to: Task failed task_1539125785626_0007_r_000000 > Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 > killedReduces: 0 > 18/10/09 23:37:50 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:51 INFO mapreduce.Job: Counters: 45 > File System Counters > FILE: Number of bytes read=0 > FILE: Number of bytes written=266505 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=215876 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=2 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > O3: Number of bytes read=0 > O3: Number of bytes written=0 > O3: Number of read operations=0 > O3: Number of large read operations=0 > O3: Number of write operations=0 > Job Counters > Failed reduce tasks=4 > Launched map tasks=1 > Launched reduce tasks=4 > Rack-local map tasks=1 > Total time spent by all maps in occupied slots (ms)=18816 > Total time spent by all reduces in occupied slots (ms)=128016 > Total time spent by all map tasks (ms)=4704 > Total time spent by all reduce tasks (ms)=16002 > Total vcore-milliseconds taken by all map tasks=4704 > Total vcore-milliseconds taken by all reduce tasks=16002 > Total megabyte-milliseconds taken by all map tasks=19267584 > Total megabyte-milliseconds taken by all reduce tasks=131088384 > Map-Reduce Framework > Map input records=716 > Map output records=32019 > Map output bytes=343475 > Map output materialized bytes=6332 > Input split bytes=121 > Combine input records=32019 > Combine output records=461 > Spilled Records=461 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=96 > CPU time spent (ms)=3300 > Physical memory (bytes) snapshot=2528358400 > Virtual memory (bytes) snapshot=5420421120 > Total committed heap usage (bytes)=2714238976 > Peak Map Physical memory (bytes)=2528358400 > Peak Map Virtual memory (bytes)=5420421120 > File Input Format Counters > Bytes Read=215755 > 18/10/09 23:37:51 INFO conf.Configuration: Removed undeclared tags: > {code} > SCM logs > {code:java} > 2018-10-09 23:37:28,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9863, call Call#101 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2018-10-09 23:37:35,232 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9863, call Call#103 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2018-10-09 23:37:42,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 9863, call Call#105 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2018-10-09 23:37:48,656 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 9863, call Call#107 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org