[
https://issues.apache.org/jira/browse/ARTEMIS-5861?focusedWorklogId=1003000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1003000
]
ASF GitHub Bot logged work on ARTEMIS-5861:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Feb/26 11:40
Start Date: 02/Feb/26 11:40
Worklog Time Spent: 10m
Work Description: gemmellr commented on PR #6202:
URL: https://github.com/apache/artemis/pull/6202#issuecomment-3834615134
> > ...I also dont actually know that we really want this set to 0 all the
time in the test suite?
>
> I don't know if we do or not. There's not much detail on
https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.
>
I meant the specific change from this PR rather than the change in
ARTEMIS-2428 ~ 7 years ago. FWIW the PR that added that change shows it to be a
related follow-up along with changes around ARTEMIS-2408, and suggests that
timeout=0 change was made after the initial 2428 changes had first caused the
test suite to hang on one specific test, and then once its changes were updated
and committed again later caused the test suite to take considerably longer, so
it was set to 0 to never wait to get back to previous run times.
> > Until yesterday the related bit previously waited for as long as needed
during the entire test suite...
>
> The problem, as outlined on the Jira, is that the call to
`awaitUninterruptibly()` can apparently hang forever so a timeout is needed for
these calls. Rather than create and document a new parameter I simply re-used
the existing, but undocumented, `shutdownTimeout` parameter.
>
Yep I realise. Though from the PR itsnt clear you knew that change also
meant it would then never wait for channel [group] shutdown at all during most
of the tests given it wasnt mentioned.
> I certainly could create a new parameter specifically for closing the
Netty `ChannelGroup` instances. I could name it something like
`channelGroupShutdownTimeout`, but then that would introduce a naming asymmetry
with `shutdownTimeout` which is specifically aimed at the Netty
`EventLoopGroup` instance. Since `shutdownTimeout` was undocumented I could
potentially just rename it to `eventLoopGroupShutdownTimeout` and then document
both new parameters, hoping that nobody was actually using `shutdownTimeout`,
or I could deprecate `shutdownTimeout` and let it live alongside the new
parameter. I'd probably need to do the same with `quietPeriod` as well.
>
> Ultimately we just need a timeout here so these calls can't hang
indefinitely. Using `shutdownTimeout` seems the simplest path forward to me.
Its certainly simple, yes, though I do wonder on what the full implications
may be of it not waiting during the tests.
Issue Time Tracking
-------------------
Worklog Id: (was: 1003000)
Time Spent: 2h 20m (was: 2h 10m)
> Netty acceptor not shutting down
> --------------------------------
>
> Key: ARTEMIS-5861
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5861
> Project: Artemis
> Issue Type: Bug
> Affects Versions: 2.44.0
> Reporter: Justin Bertram
> Assignee: Justin Bertram
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> Thread dump analysis reveals that the broker hangs indefinitely when trying
> to close Netty channel groups in a Netty acceptor, e.g.:
> {noformat}
> State: WAITING (on object monitor)
> Stack trace:
> at
> io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:290)
> - locked <0x00000000dbd095a8> (a
> io.netty.channel.group.DefaultChannelGroupFuture)
> at
> io.netty.channel.group.DefaultChannelGroupFuture.awaitUninterruptibly(DefaultChannelGroupFuture.java:178)
> at
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.asyncStop(NettyAcceptor.java:793){noformat}
>
> The code at {{NettyAcceptor.java:793}} calls
> {{channelGroup.close().awaitUninterruptibly()}} without a timeout parameter
> causing indefinite hang when channels fail to close properly. This problem is
> very rare and there is no good reproducer
> The broker should complete shutdown within a reasonable timeout period,
> forcefully closing any remaining connections if necessary.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]