[ 
https://issues.apache.org/jira/browse/HDDS-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-11558:
-----------------------------------
    Description: 
We found HBase RegionServer crashes after a few days. I was able to reproduce 
it and confirm Ozone client failover handling can cause unexpected behavior, 
which caused HBase RegionServer crash.

The RS crashes because it renamed a file that failed with an exception, 
however, it actually succeeded on the OM side. RS retried but because rename 
already happened, it returned a -1. This is unexpected so RS crashed.

RS code 
[https://github.com/apache/hbase/blob/52e9c0fb9c4fc0fdd42801359171356d77c74a90/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java#L1093]
{code:java}
boolean rename(Path srcpath, Path dstPath) throws IOException {
    IOException lastIOE = null;
    int i = 0;
    do {
      try {
        return fs.rename(srcpath, dstPath);
      } catch (IOException ioe) {
        lastIOE = ioe;
        if (!fs.exists(srcpath) && fs.exists(dstPath)) return true; // 
successful move
        // dir is not there, retry after some time.
        try {
          sleepBeforeRetry("Rename Directory", i + 1);
        } catch (InterruptedException e) {
          throw (InterruptedIOException) new 
InterruptedIOException().initCause(e);
        }
      }
    } while (++i <= hdfsClientRetriesNumber);

    throw new IOException("Exception in rename", lastIOE);
  }
{code}
Reproduction steps:

1. Suppose OM1 was leader, OM2 and OM3 were followers.
2. pause follower OM2 and follower 3
3. issue rename command (OM1 receives it, append the ratis log locally)
hdfs dfs -touchz ofs://ozone1728456768/test1/buck1/src
hdfs dfs -mv ofs://ozone1728456768/test1/buck1/src 
ofs://ozone1728456768/test1/buck1/dst

4. pause leader OM1
5. wait 5 seconds
6. resume follower OM2 and OM3 (OM2 and OM3 will determine OM1 becomes 
unresponsive)
7. wait 5 seconds
8. resume leader OM1 (OM1 comes back, send the ratis log to OM2 and OM3, who 
will append and apply the ratis log. OM1 will then find it loses leadership but 
doesn't know how the new leader is, so client throws an exception that bubbles 
up to the application)
{noformat}
24/10/09 23:30:32 INFO retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException):
OM:om1546335780 is not the leader. Could not determine the leader node.
at 
org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:93)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponseImpl(OzoneManagerRatisServer.java:497)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.lambda$2(OzoneManagerRatisServer.java:287)
at org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs(MetricUtil.java:46)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponse(OzoneManagerRatisServer.java:285)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:265)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:254)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.internalProcessRequest(OzoneManagerProtocolServerSideTranslatorPB.java:228)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:162)
at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:153)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2901)
, while invoking $Proxy11.submitRequest over 
nodeId=om1,nodeAddress=ccycloud-1.nightly7310-hi.root.comops.site:9862. Trying 
to failover immediately.
24/10/09 23:30:32 ERROR ozone.BasicRootedOzoneFileSystem: rename key failed: 
Unable to get file status: volume: test1 bucket: buck1 key: src. 
source:test1/buck1/src, destin:test1/buck1/dst
mv: `ofs://ozone1728456768/test1/buck1/src': Input/output error{noformat}
So from client perspective, the rename failed. But the rename succeeded at OM, 
and so the second rename would fail.

 
{noformat}
hdfs dfs -ls ofs://ozone1728456768/test1/buck1/src
ls: `ofs://ozone1728456768/test1/buck1/src': No such file or directory
hdfs dfs -ls ofs://ozone1728456768/test1/buck1/dst
rw-rw-rw 3 hive hive 0 2024-10-09 23:29 ofs://ozone1728456768/test1/buck1/dst
{noformat}
 

  was:
We found HBase RegionServer crashes after a few days. I was able to reproduce 
it and confirm Ozone client failover handling can cause unexpected behavior, 
which caused HBase RegionServer crash.

The RS crashes because it renamed a file that failed with an exception, 
however, it actually succeeded on the OM side. RS retried but because rename 
already happened, it returned a -1. This is unexpected so RS crashed.

RS code 
[https://github.com/apache/hbase/blob/52e9c0fb9c4fc0fdd42801359171356d77c74a90/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java#L1093]
{code:java}
boolean rename(Path srcpath, Path dstPath) throws IOException {
    IOException lastIOE = null;
    int i = 0;
    do {
      try {
        return fs.rename(srcpath, dstPath);
      } catch (IOException ioe) {
        lastIOE = ioe;
        if (!fs.exists(srcpath) && fs.exists(dstPath)) return true; // 
successful move
        // dir is not there, retry after some time.
        try {
          sleepBeforeRetry("Rename Directory", i + 1);
        } catch (InterruptedException e) {
          throw (InterruptedIOException) new 
InterruptedIOException().initCause(e);
        }
      }
    } while (++i <= hdfsClientRetriesNumber);

    throw new IOException("Exception in rename", lastIOE);
  }
{code}
Reproduction steps:

1. Suppose OM1 was leader, OM2 and OM3 were followers.
2. pause follower OM2 and follower 3
3. issue rename command (OM1 receives it, append the ratis log locally)
hdfs dfs -touchz ofs://ozone1728456768/test1/buck1/src
hdfs dfs -mv ofs://ozone1728456768/test1/buck1/src 
ofs://ozone1728456768/test1/buck1/dst

4. pause leader OM1
5. wait 5 seconds
6. resume follower OM2 and OM3 (OM2 and OM3 will determine OM1 becomes 
unresponsive)
7. wait 5 seconds
8. resume leader OM1 (OM1 comes back, send the ratis log to OM2 and OM3, who 
will append and apply the ratis log. OM1 will then find it loses leadership but 
doesn't know how the new leader is, so client throws an exception that bubbles 
up to the application)
{noformat}
24/10/09 23:30:32 INFO retry.RetryInvocationHandler: 
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException):
OM:om1546335780 is not the leader. Could not determine the leader node.
at 
org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:93)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponseImpl(OzoneManagerRatisServer.java:497)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.lambda$2(OzoneManagerRatisServer.java:287)
at org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs(MetricUtil.java:46)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponse(OzoneManagerRatisServer.java:285)
at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:265)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:254)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.internalProcessRequest(OzoneManagerProtocolServerSideTranslatorPB.java:228)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:162)
at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:153)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2901)
, while invoking $Proxy11.submitRequest over 
nodeId=om1,nodeAddress=ccycloud-1.nightly7310-hi.root.comops.site:9862. Trying 
to failover immediately.
24/10/09 23:30:32 ERROR ozone.BasicRootedOzoneFileSystem: rename key failed: 
Unable to get file status: volume: test1 bucket: buck1 key: src. 
source:test1/buck1/src, destin:test1/buck1/dst
mv: `ofs://ozone1728456768/test1/buck1/src': Input/output error{noformat}
So from client perspective, the rename failed. But the rename succeeded at OM, 
and so the second rename would fail.

hdfs dfs -ls ofs://ozone1728456768/test1/buck1/src
ls: `ofs://ozone1728456768/test1/buck1/src': No such file or directory

hdfs dfs -ls ofs://ozone1728456768/test1/buck1/dst
-rw-rw-rw- 3 hive hive 0 2024-10-09 23:29 ofs://ozone1728456768/test1/buck1/dst


> HBase RegionServer crashes due to inconsistency caused by Ozone client 
> failover handling
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-11558
>                 URL: https://issues.apache.org/jira/browse/HDDS-11558
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> We found HBase RegionServer crashes after a few days. I was able to reproduce 
> it and confirm Ozone client failover handling can cause unexpected behavior, 
> which caused HBase RegionServer crash.
> The RS crashes because it renamed a file that failed with an exception, 
> however, it actually succeeded on the OM side. RS retried but because rename 
> already happened, it returned a -1. This is unexpected so RS crashed.
> RS code 
> [https://github.com/apache/hbase/blob/52e9c0fb9c4fc0fdd42801359171356d77c74a90/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java#L1093]
> {code:java}
> boolean rename(Path srcpath, Path dstPath) throws IOException {
>     IOException lastIOE = null;
>     int i = 0;
>     do {
>       try {
>         return fs.rename(srcpath, dstPath);
>       } catch (IOException ioe) {
>         lastIOE = ioe;
>         if (!fs.exists(srcpath) && fs.exists(dstPath)) return true; // 
> successful move
>         // dir is not there, retry after some time.
>         try {
>           sleepBeforeRetry("Rename Directory", i + 1);
>         } catch (InterruptedException e) {
>           throw (InterruptedIOException) new 
> InterruptedIOException().initCause(e);
>         }
>       }
>     } while (++i <= hdfsClientRetriesNumber);
>     throw new IOException("Exception in rename", lastIOE);
>   }
> {code}
> Reproduction steps:
> 1. Suppose OM1 was leader, OM2 and OM3 were followers.
> 2. pause follower OM2 and follower 3
> 3. issue rename command (OM1 receives it, append the ratis log locally)
> hdfs dfs -touchz ofs://ozone1728456768/test1/buck1/src
> hdfs dfs -mv ofs://ozone1728456768/test1/buck1/src 
> ofs://ozone1728456768/test1/buck1/dst
> 4. pause leader OM1
> 5. wait 5 seconds
> 6. resume follower OM2 and OM3 (OM2 and OM3 will determine OM1 becomes 
> unresponsive)
> 7. wait 5 seconds
> 8. resume leader OM1 (OM1 comes back, send the ratis log to OM2 and OM3, who 
> will append and apply the ratis log. OM1 will then find it loses leadership 
> but doesn't know how the new leader is, so client throws an exception that 
> bubbles up to the application)
> {noformat}
> 24/10/09 23:30:32 INFO retry.RetryInvocationHandler: 
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException):
> OM:om1546335780 is not the leader. Could not determine the leader node.
> at 
> org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException.convertToOMNotLeaderException(OMNotLeaderException.java:93)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponseImpl(OzoneManagerRatisServer.java:497)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.lambda$2(OzoneManagerRatisServer.java:287)
> at 
> org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs(MetricUtil.java:46)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.createOmResponse(OzoneManagerRatisServer.java:285)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:265)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:254)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.internalProcessRequest(OzoneManagerProtocolServerSideTranslatorPB.java:228)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:162)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:153)
> at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2901)
> , while invoking $Proxy11.submitRequest over 
> nodeId=om1,nodeAddress=ccycloud-1.nightly7310-hi.root.comops.site:9862. 
> Trying to failover immediately.
> 24/10/09 23:30:32 ERROR ozone.BasicRootedOzoneFileSystem: rename key failed: 
> Unable to get file status: volume: test1 bucket: buck1 key: src. 
> source:test1/buck1/src, destin:test1/buck1/dst
> mv: `ofs://ozone1728456768/test1/buck1/src': Input/output error{noformat}
> So from client perspective, the rename failed. But the rename succeeded at 
> OM, and so the second rename would fail.
>  
> {noformat}
> hdfs dfs -ls ofs://ozone1728456768/test1/buck1/src
> ls: `ofs://ozone1728456768/test1/buck1/src': No such file or directory
> hdfs dfs -ls ofs://ozone1728456768/test1/buck1/dst
> rw-rw-rw 3 hive hive 0 2024-10-09 23:29 ofs://ozone1728456768/test1/buck1/dst
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to