Pratyush Bhatt created HDDS-10689:
-------------------------------------

             Summary: [HBase Ozone] All HBase HMasters/RS down with 
"OMException: Unable to allocate a container to the block"
                 Key: HDDS-10689
                 URL: https://issues.apache.org/jira/browse/HDDS-10689
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM
            Reporter: Pratyush Bhatt


Both the _HMasters_ and all _RS_ failed with same "OMException: Unable to 
allocate a container to the block" error approximately at the same time.

Logs from {_}HMaster{_}:
{code:java}
2024-04-10 18:24:23,197 ERROR org.apache.hadoop.hbase.master.HMaster: ***** 
ABORTING master Master-1,22001,1712638569318: IOE in log roller *****
INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Unable to 
allocate a container to the block of size: 268435456, replicationConfig: 
RATIS/THREE. Waiting for one of pipelines to be OPEN failed. Pipeline 
f1362ba6-ee67-48a9-bdb7-ac80e8d55435,3c2d89bc-935b-424f-8c90-6dcc74933640,040a71f0-fa7d-43ff-baed-37ae3ee87c63,31caf9ea-c145-4d37-91ef-456088158b99,37b5a056-55e0-485a-ad35-53ef27069e39
 is not ready in 60000 ms
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:756)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2293)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2281)
        at 
org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2115)
        at 
org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:855)
        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:400)
        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:304)
        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createNonRecursive(BasicRootedOzoneFileSystem.java:280)
        at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1382)
        at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1360)
        at 
org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:63)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160)
        at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:726)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:129)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:886)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
2024-04-10 18:24:23,200 INFO org.apache.ranger.plugin.util.PolicyRefresher: 
PolicyRefresher(serviceName=cm_hbase).run(): interrupted! Exiting thread
java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at 
org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:208) 
{code}

Checked the SCM Leader logs, showing WARN logs like below:
{code:java}
2024-04-10 18:23:22,106 WARN [IPC Server handler 3 on 
9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider: 
Pipeline creation failed for repConfig RATIS/THREE Datanodes may be used up. 
Try to see if any pipeline is in ALLOCATED state, and then will wait for it to 
be OPEN
org.apache.hadoop.hdds.scm.exceptions.SCMException: Pipeline creation failed 
due to no sufficient healthy datanodes. Required 3. Found 1. Excluded 7.
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.filterViableNodes(PipelinePlacementPolicy.java:167)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelinePlacementPolicy.chooseDatanodesInternal(PipelinePlacementPolicy.java:256)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:209)
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:140)
        at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:176)
        at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:56)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:89)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:255)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.createPipeline(PipelineManagerImpl.java:241)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:100)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
        at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163)
        at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:206)
        at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198)
        at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:144)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
        at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:115)
        at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15752)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to