[ 
https://issues.apache.org/jira/browse/HDDS-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-9814:
-----------------------------------
    Description: 
{{TestOzoneFileSystemWithStreaming}} logs lots of {{"connection refused"}} and 
{{"schedule reconnecting"}} messages during shutdown.

{code:title=sample}
2023-11-30 18:39:00,039 [main] INFO  ozone.MiniOzoneClusterImpl 
(MiniOzoneClusterImpl.java:shutdown(449)) - Shutting down the Mini Ozone Cluster

...

2023-11-30 18:39:01,126 [Thread-1140] DEBUG client.NettyClientStreamRpc 
(NettyClientStreamRpc.java:close(452)) - flush? true
2023-11-30 18:39:01,122 [Thread-1133] WARN  util.PeerProxyMap 
(PeerProxyMap.java:closeProxy(170)) - 
2ad30381-d2ef-4801-b4d0-3965e2d4accc-NettyServerStreamRpc: Failed to close 
proxy for peer 3c309c61-fe70-460a-bb5b-539134755074|127.0.0.1:15014, proxy 
class: class org.apache.ratis.client.impl.DataStreamClientImpl
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 finishConnect(..) failed: Connection refused: /127.0.0.1:15015
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
refused
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
        at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at 
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:750)
{code}

This is during a normal, successful test run (logs: 
[^TestOzoneFileSystemWithStreaming-output-pass.txt]).

On a related note, test run occasionally times out during shutdown:

{code}
[INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
...
[WARNING] Tests run: 1030, Failures: 0, Errors: 0, Skipped: 24
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on 
project ozone-integration-test: There was a timeout in the fork -> [Help 1]
{code}

(logs: 
[^org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming-output.txt],  
[^2023-05-05T05-46-34_028-jvmRun1.dump])

More recently I noticed job runner timeout.  Unfortunately we don't have logs 
for that:

{code}
Thu, 30 Nov 2023 13:38:47 GMT
[INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
Thu, 30 Nov 2023 15:58:17 GMT
Error: The operation was canceled.
{code}

CC [~szetszwo]

  was:
{{TestOzoneFileSystemWithStreaming}} logs lots of {{"connection refused"}} and 
{{"schedule reconnecting"}} messages during shutdown.

{code:title=sample}
2023-11-30 18:39:00,039 [main] INFO  ozone.MiniOzoneClusterImpl 
(MiniOzoneClusterImpl.java:shutdown(449)) - Shutting down the Mini Ozone Cluster

...

2023-11-30 18:39:01,126 [Thread-1140] DEBUG client.NettyClientStreamRpc 
(NettyClientStreamRpc.java:close(452)) - flush? true
2023-11-30 18:39:01,122 [Thread-1133] WARN  util.PeerProxyMap 
(PeerProxyMap.java:closeProxy(170)) - 
2ad30381-d2ef-4801-b4d0-3965e2d4accc-NettyServerStreamRpc: Failed to close 
proxy for peer 3c309c61-fe70-460a-bb5b-539134755074|127.0.0.1:15014, proxy 
class: class org.apache.ratis.client.impl.DataStreamClientImpl
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 finishConnect(..) failed: Connection refused: /127.0.0.1:15015
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
refused
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
        at 
org.apache.ratis.thirdparty.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
        at 
org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
        at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at 
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:750)
{code}

This is during a normal, successful test run (logs: 
[^TestOzoneFileSystemWithStreaming-output-pass.txt]).

On a related note, test run occasionally times out during shutdown:

{code}
[INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
...
[WARNING] Tests run: 1030, Failures: 0, Errors: 0, Skipped: 24
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on 
project ozone-integration-test: There was a timeout in the fork -> [Help 1]
{code}

(logs: 
[^org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming-output.txt],  
[^2023-05-05T05-46-34_028-jvmRun1.dump])

More recently I noticed job runner timeout.  Unfortunately we don't have logs 
for that:

{code}
Thu, 30 Nov 2023 13:38:47 GMT
[INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
Thu, 30 Nov 2023 15:58:17 GMT
Error: The operation was canceled.
{code}


> NettyClientStreamRpc trying to establish connection during shutdown
> -------------------------------------------------------------------
>
>                 Key: HDDS-9814
>                 URL: https://issues.apache.org/jira/browse/HDDS-9814
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Attila Doroszlai
>            Priority: Major
>         Attachments: 2023-05-05T05-46-34_028-jvmRun1.dump, 
> TestOzoneFileSystemWithStreaming-output-pass.txt, 
> org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming-output.txt
>
>
> {{TestOzoneFileSystemWithStreaming}} logs lots of {{"connection refused"}} 
> and {{"schedule reconnecting"}} messages during shutdown.
> {code:title=sample}
> 2023-11-30 18:39:00,039 [main] INFO  ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:shutdown(449)) - Shutting down the Mini Ozone 
> Cluster
> ...
> 2023-11-30 18:39:01,126 [Thread-1140] DEBUG client.NettyClientStreamRpc 
> (NettyClientStreamRpc.java:close(452)) - flush? true
> 2023-11-30 18:39:01,122 [Thread-1133] WARN  util.PeerProxyMap 
> (PeerProxyMap.java:closeProxy(170)) - 
> 2ad30381-d2ef-4801-b4d0-3965e2d4accc-NettyServerStreamRpc: Failed to close 
> proxy for peer 3c309c61-fe70-460a-bb5b-539134755074|127.0.0.1:15014, proxy 
> class: class org.apache.ratis.client.impl.DataStreamClientImpl
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  finishConnect(..) failed: Connection refused: /127.0.0.1:15015
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
> refused
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.newConnectException0(Errors.java:166)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:131)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.unix.Socket.finishConnect(Socket.java:359)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
>       at 
> org.apache.ratis.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
>       at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>       at 
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>       at java.lang.Thread.run(Thread.java:750)
> {code}
> This is during a normal, successful test run (logs: 
> [^TestOzoneFileSystemWithStreaming-output-pass.txt]).
> On a related note, test run occasionally times out during shutdown:
> {code}
> [INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
> ...
> [WARNING] Tests run: 1030, Failures: 0, Errors: 0, Skipped: 24
> ...
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) 
> on project ozone-integration-test: There was a timeout in the fork -> [Help 1]
> {code}
> (logs: 
> [^org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming-output.txt],  
> [^2023-05-05T05-46-34_028-jvmRun1.dump])
> More recently I noticed job runner timeout.  Unfortunately we don't have logs 
> for that:
> {code}
> Thu, 30 Nov 2023 13:38:47 GMT
> [INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileSystemWithStreaming
> Thu, 30 Nov 2023 15:58:17 GMT
> Error: The operation was canceled.
> {code}
> CC [~szetszwo]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to