[ 
https://issues.apache.org/jira/browse/HDDS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-10256:
-------------------------------
    Target Version/s: 2.0.0, 1.4.2

> Block allocation should retry if SCM is in safe mode
> ----------------------------------------------------
>
>                 Key: HDDS-10256
>                 URL: https://issues.apache.org/jira/browse/HDDS-10256
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Pratyush Bhatt
>            Assignee: Ashish Kumar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0, HDDS-7593
>
>
> [~pratyush.bhatt] found that HBase goes down when Ozone is in Rolling 
> restart. Turns out OM doesn't seem to retry allocating blocks if SCM is in 
> safe mode.
> {noformat}
> 2024-01-30 16:57:39,846 [om1-OMStateMachineApplyTransactionThread - 0] INFO  
> bucket.OMBucketCreateRequest 
> (OMBucketCreateRequest.java:validateAndUpdateCache(296)) - created bucket: 
> weichiu of layout FILE_SYSTEM_OPTIMIZED in volume: user
> 16:57:39.869 [IPC Server handler 0 on default port 15036] ERROR SCMAudit - 
> user=weichiu | ip=10.96.129.4 | op=ALLOCATE_BLOCK {replication=RATIS/THREE, 
> owner=omServiceIdDefault, size=4194304, num=1, client=} | ret=FAILURE
> org.apache.hadoop.hdds.scm.exceptions.SCMException: SafeModePrecheck failed 
> for allocateBlock
>       at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:157)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:204)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:192)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:142)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
>  [classes/:?]
>       at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:113)
>  [classes/:?]
>       at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:14430)
>  [classes/:?]
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  [hadoop-common-3.3.6.jar:?]
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  [hadoop-common-3.3.6.jar:?]
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>  [hadoop-common-3.3.6.jar:?]
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) 
> [hadoop-common-3.3.6.jar:?]
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) 
> [hadoop-common-3.3.6.jar:?]
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) 
> [hadoop-common-3.3.6.jar:?]
>       at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_392]
>       at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_392]
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  [hadoop-common-3.3.6.jar:?]
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) 
> [hadoop-common-3.3.6.jar:?]
> {noformat}
> We should retry. I'll attach a reproduction test case for reference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to