Zita Dombi created HDDS-14705:
---------------------------------

             Summary: Ozone clients should retry when OM is in prepare mode
                 Key: HDDS-14705
                 URL: https://issues.apache.org/jira/browse/HDDS-14705
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Zita Dombi
            Assignee: Zita Dombi


If OM is in prepare mode there is no failover handling: 
[https://github.com/apache/ozone/blob/17a126da1775d45f8843b47985bb5deb0ea3e928/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java#L473-L483]
 
{code:java}
    } else if (ex instanceof StateMachineException) {
      StateMachineException smEx = (StateMachineException) ex;
      Throwable cause = smEx.getCause();
      if (cause instanceof OMException) {
        OMException omEx = (OMException) cause;
        // Do not failover if the operation was blocked because the OM was
        // prepared.
        return omEx.getResult() !=
            OMException.ResultCodes.NOT_SUPPORTED_OPERATION_WHEN_PREPARED;
      }
    }{code}
This causes can cause job failures: 
{code:java}
26/01/21 16:22:21 INFO mapreduce.Job: Task Id : 
attempt_1768994470888_0006_m_000007_0, Status : FAILED
Error: NOT_SUPPORTED_OPERATION_WHEN_PREPARED 
org.apache.hadoop.ozone.om.exceptions.OMException: Cannot apply write request 
CreateFile when OM is in prepare mode.       at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:761)
    at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2332)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2321)
    at 
org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2250) at 
org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:962)  at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:413)
  at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:317)
        at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.lambda$create$1(BasicRootedOzoneFileSystem.java:277)
   at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:167)  
     at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:157)
    at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.create(BasicRootedOzoneFileSystem.java:276)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1233) at 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1210) at 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1091) at 
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1078) at 
org.apache.hadoop.examples.terasort.TeraOutputFormat.getRecordWriter(TeraOutputFormat.java:141)
      at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:660)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:780)      at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)       at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
     at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1964)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to