[ 
https://issues.apache.org/jira/browse/HDDS-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDDS-10689:
--------------------------------------

    Assignee: Sammi Chen  (was: Wei-Chiu Chuang)

> [HBase Ozone] All HBase HMasters/RS down with "OMException: Unable to 
> allocate a container to the block"
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-10689
>                 URL: https://issues.apache.org/jira/browse/HDDS-10689
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Pratyush Bhatt
>            Assignee: Sammi Chen
>            Priority: Major
>
> Both the _HMasters_ and all _RS_ failed with same "OMException: Unable to 
> allocate a container to the block" error approximately at the same time.
> Logs from {_}HMaster{_}:
> {code:java}
> 2024-04-10 18:24:23,197 ERROR org.apache.hadoop.hbase.master.HMaster: ***** 
> ABORTING master Master-1,22001,1712638569318: IOE in log roller *****
> INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Unable to 
> allocate a container to the block of size: 268435456, replicationConfig: 
> RATIS/THREE. Waiting for one of pipelines to be OPEN failed. Pipeline 
> f1362ba6-ee67-48a9-bdb7-ac80e8d55435,3c2d89bc-935b-424f-8c90-6dcc74933640,040a71f0-fa7d-43ff-baed-37ae3ee87c63,31caf9ea-c145-4d37-91ef-456088158b99,37b5a056-55e0-485a-ad35-53ef27069e39
>  is not ready in 60000 ms
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:756)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2293)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2281)
>         at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2115)
>         at 
> org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:855)
>         at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:400)
>         at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:304)
>         at 
> org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createNonRecursive(BasicRootedOzoneFileSystem.java:280)
>         at 
> org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1382)
>         at 
> org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1360)
>         at 
> org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:63)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160)
>         at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
>         at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
>         at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
> 2024-04-10 18:24:23,200 INFO org.apache.ranger.plugin.util.PolicyRefresher: 
> PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at 
> org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208) 
> {code}
> Checked the SCM Leader logs, showing WARN logs like below:
> {code:java}
> 2024-04-10 18:23:22,106 WARN [IPC Server handler 3 on 
> 9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider: 
> Pipeline creation failed for repConfig RATIS/THREE Datanodes may be used up. 
> Try to see if any pipeline is in ALLOCATED state, and then will wait for it 
> to be OPEN
> org.apache.hadoop.hdds.scm.exceptions.SCMException: Pipeline creation failed 
> due to no sufficient healthy datanodes. Required 3. Found 1. Excluded 7.
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.filterViableNodes(PipelinePlacementPolicy.java:167)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.chooseDatanodesInternal(PipelinePlacementPolicy.java:256)
>         at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:209)
>         at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:140)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:176)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:56)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:89)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:255)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:241)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:100)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
>         at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163)
>         at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:206)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:144)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:115)
>         at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15752)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to