[ 
https://issues.apache.org/jira/browse/HDDS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-6703:
-----------------------------------
    Description: 
After HDDS-6685 testPrepareDownedOM fails almost consistently (intermittently 
passes).  Upgrade to Ratis 2.3.0 (not yet released) fixes it.  Disable the test 
until Ratis upgrade.

 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/

{code}
2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start RPC Server.
java.net.BindException: Problem binding to [localhost:39581] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
        at org.apache.hadoop.ipc.Server.bind(Server.java:640)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
        at 
org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
{code}

  was:
HDDS-6685 changed install snapshot logic in OM, restarting various OM 
components during the process.  However, it does not wait for OM RPC server to 
stop, leading to intermittent BindException a bit later during start.

 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
 * 
https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/

{code}
2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start RPC Server.
java.net.BindException: Problem binding to [localhost:39581] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
        at org.apache.hadoop.ipc.Server.bind(Server.java:640)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
        at 
org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
        at 
org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
{code}


> Disable failing testPrepareDownedOM until Ratis upgrade
> -------------------------------------------------------
>
>                 Key: HDDS-6703
>                 URL: https://issues.apache.org/jira/browse/HDDS-6703
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>    Affects Versions: 1.3.0
>            Reporter: Attila Doroszlai
>            Assignee: Attila Doroszlai
>            Priority: Major
>              Labels: pull-request-available
>
> After HDDS-6685 testPrepareDownedOM fails almost consistently (intermittently 
> passes).  Upgrade to Ratis 2.3.0 (not yet released) fixes it.  Disable the 
> test until Ratis upgrade.
>  * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14835/it-flaky/hadoop-ozone/integration-test/
>  * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14832/it-flaky/hadoop-ozone/integration-test/
>  * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2022/05/05/14826/it-flaky/hadoop-ozone/integration-test/
> {code}
> 2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager 
> (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
> start RPC Server.
> java.net.BindException: Problem binding to [localhost:39581] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
>       at org.apache.hadoop.ipc.Server.bind(Server.java:640)
>       at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
>       at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
>       at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
>       at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
>       at 
> org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to