Luca Martella created CXF-9173:
----------------------------------
Summary: Default SO_LINGER induce TCP RST on each connection
closure
Key: CXF-9173
URL: https://issues.apache.org/jira/browse/CXF-9173
Project: CXF
Issue Type: Bug
Components: Transports
Affects Versions: 4.1.2
Reporter: Luca Martella
After migrating from CXF 3.5.11 to 4.1.2, we noticed that TCP connections
managed by the async HTTP conduit are abruptly closed with a TCP reset when the
connection time-to-live (CONNECTION_TTL) expires, resulting in Connection Reset
errors on the remote side.
Both CXF versions allow customisation of key parameters via the CXF bus:
- CONNECTION_TTL (default: 60000 ms): Duration a connection remains open.
- SO_LINGER (default: -1): Controls socket linger time, affecting how
connections are closed.
The main change in CXF 4.x is the unit for the SO_LINGER option that is now
expected to be in milliseconds while it was expressed in seconds on older CXF 3:
1. The value from the CXF bus is interpreted as milliseconds when creating the
IOReactorConfig (see
[code|https://github.com/apache/cxf/blob/cxf-4.1.2/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/AsyncHTTPConduitFactory.java#L341])
{code:java}
final IOReactorConfig config = IOReactorConfig.custom()
.setSoLinger(TimeValue.ofMilliseconds(soLinger)){code}
2. It is then converted back to seconds when the reactor consumes the config
(see
[code|https://github.com/apache/httpcomponents-core/blob/rel/v5.4-alpha1/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L296])
{code:java}
final int linger = this.reactorConfig.getSoLinger().toSecondsIntBound();
if (linger >= 0) {
socket.setSoLinger(true, linger);
} {code}
The problem occurs when the default value (-1) for the SO_LINGER option is
used. In CXF 4.x, this value is first interpreted as -1 milliseconds, then
converted to 0 seconds (= 0 is the result of doing _toSecondsIntBound()_ on a
TimeValue of -1 milliseconds).
As a result, the linger option is enabled with a timeout of 0, causing sockets
to close immediately and trigger a TCP reset.
That's definitively a difference in behaviour compared to CXF 3 version where
the default SO_LINGER value -1 was meant to disable socket linger by default.
1. Can you please clarify if this change was on purpose or if its a bug
resulting from the various unit conversions?
2. We see setting org{_}.apache.cxf.transport.http.async.SO_LINGER{_} to -1000
effectively disables the linger option, which aligns with the default behavior
in CXF 3.x. Is this a valid workaround to prevent abrupt socket closures and
TCP resets until the issue is clarified or resolved?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)