Zita Dombi created HDDS-14705:
---------------------------------
Summary: Ozone clients should retry when OM is in prepare mode
Key: HDDS-14705
URL: https://issues.apache.org/jira/browse/HDDS-14705
Project: Apache Ozone
Issue Type: Bug
Reporter: Zita Dombi
Assignee: Zita Dombi
If OM is in prepare mode there is no failover handling:
[https://github.com/apache/ozone/blob/17a126da1775d45f8843b47985bb5deb0ea3e928/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java#L473-L483]
{code:java}
} else if (ex instanceof StateMachineException) {
StateMachineException smEx = (StateMachineException) ex;
Throwable cause = smEx.getCause();
if (cause instanceof OMException) {
OMException omEx = (OMException) cause;
// Do not failover if the operation was blocked because the OM was
// prepared.
return omEx.getResult() !=
OMException.ResultCodes.NOT_SUPPORTED_OPERATION_WHEN_PREPARED;
}
}{code}
This causes can cause job failures:
{code:java}
26/01/21 16:22:21 INFO mapreduce.Job: Task Id :
attempt_1768994470888_0006_m_000007_0, Status : FAILED
Error: NOT_SUPPORTED_OPERATION_WHEN_PREPARED
org.apache.hadoop.ozone.om.exceptions.OMException: Cannot apply write request
CreateFile when OM is in prepare mode. at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:761)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleSubmitRequestAndSCMSafeModeRetry(OzoneManagerProtocolClientSideTranslatorPB.java:2332)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:2321)
at
org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:2250) at
org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:962) at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.createFile(BasicRootedOzoneClientAdapterImpl.java:413)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.createOutputStream(BasicRootedOzoneFileSystem.java:317)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.lambda$create$1(BasicRootedOzoneFileSystem.java:277)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:167)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:157)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.create(BasicRootedOzoneFileSystem.java:276)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1233) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1210) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1091) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1078) at
org.apache.hadoop.examples.terasort.TeraOutputFormat.getRecordWriter(TeraOutputFormat.java:141)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:660)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:780) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1964)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]