[ 
https://issues.apache.org/jira/browse/ARTEMIS-5861?focusedWorklogId=1003000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1003000
 ]

ASF GitHub Bot logged work on ARTEMIS-5861:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Feb/26 11:40
            Start Date: 02/Feb/26 11:40
    Worklog Time Spent: 10m 
      Work Description: gemmellr commented on PR #6202:
URL: https://github.com/apache/artemis/pull/6202#issuecomment-3834615134

   > > ...I also dont actually know that we really want this set to 0 all the 
time in the test suite?
   > 
   > I don't know if we do or not. There's not much detail on 
https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.
   > 
   
   I meant the specific change from this PR rather than the change in 
ARTEMIS-2428 ~ 7 years ago. FWIW the PR that added that change shows it to be a 
related follow-up along with changes around ARTEMIS-2408, and suggests that 
timeout=0 change was made after the initial 2428 changes had first caused the 
test suite to hang on one specific test, and then once its changes were updated 
and committed again later caused the test suite to take considerably longer, so 
it was set to 0 to never wait to get back to previous run times.
   
   > > Until yesterday the related bit previously waited for as long as needed 
during the entire test suite...
   > 
   > The problem, as outlined on the Jira, is that the call to 
`awaitUninterruptibly()` can apparently hang forever so a timeout is needed for 
these calls. Rather than create and document a new parameter I simply re-used 
the existing, but undocumented, `shutdownTimeout` parameter.
   > 
   
   Yep I realise. Though from the PR itsnt clear you knew that change also 
meant it would then never wait for channel [group] shutdown at all during most 
of the tests given it wasnt mentioned.
   
   > I certainly could create a new parameter specifically for closing the 
Netty `ChannelGroup` instances. I could name it something like 
`channelGroupShutdownTimeout`, but then that would introduce a naming asymmetry 
with `shutdownTimeout` which is specifically aimed at the Netty 
`EventLoopGroup` instance. Since `shutdownTimeout` was undocumented I could 
potentially just rename it to `eventLoopGroupShutdownTimeout` and then document 
both new parameters, hoping that nobody was actually using `shutdownTimeout`, 
or I could deprecate `shutdownTimeout` and let it live alongside the new 
parameter. I'd probably need to do the same with `quietPeriod` as well.
   > 
   > Ultimately we just need a timeout here so these calls can't hang 
indefinitely. Using `shutdownTimeout` seems the simplest path forward to me.
   
   Its certainly simple, yes, though I do wonder on what the full implications 
may be of it not waiting during the tests.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 1003000)
    Time Spent: 2h 20m  (was: 2h 10m)

> Netty acceptor not shutting down
> --------------------------------
>
>                 Key: ARTEMIS-5861
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5861
>             Project: Artemis
>          Issue Type: Bug
>    Affects Versions: 2.44.0
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Thread dump analysis reveals that the broker hangs indefinitely when trying 
> to close Netty channel groups in a Netty acceptor, e.g.:
> {noformat}
>   State: WAITING (on object monitor)
>   Stack trace:
>     at 
> io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:290)
>       - locked <0x00000000dbd095a8> (a 
> io.netty.channel.group.DefaultChannelGroupFuture)
>     at 
> io.netty.channel.group.DefaultChannelGroupFuture.awaitUninterruptibly(DefaultChannelGroupFuture.java:178)
>     at 
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.asyncStop(NettyAcceptor.java:793){noformat}
>   
> The code at {{NettyAcceptor.java:793}} calls 
> {{channelGroup.close().awaitUninterruptibly()}} without a timeout parameter 
> causing indefinite hang when channels fail to close properly. This problem is 
> very rare and there is no good reproducer
> The broker should complete shutdown within a reasonable timeout period, 
> forcefully closing any remaining connections if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to