Kai Zander created CXF-9115:
-------------------------------
Summary: Race Condition in HttpClientHttpConduit Causes Writing
Thread to Hang Forever
Key: CXF-9115
URL: https://issues.apache.org/jira/browse/CXF-9115
Project: CXF
Issue Type: Bug
Components: Transports
Affects Versions: 4.0.6, 4.1.0
Reporter: Kai Zander
Attachments: image-2025-03-05-16-43-34-726.png
It is possible for {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}}
to be called _after_ the underlying subscription has already been cancelled,
for example, if a connect timeout happens _before_
{{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} is called.
In this case, the writing thread will be stuck in
{{HttpClientHTTPConduit.HttpClientPipedOutputStream#write}}, waiting forever
for space in the write buffer.
This happens every once in a while on our production system, causing it to
hang. The threads are stuck here:
{code}
"main@1" tid=0x1 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait0(Object.java:-1)
at java.lang.Object.wait(Object.java:366)
at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:279)
at java.io.PipedInputStream.receive(PipedInputStream.java:237)
at java.io.PipedOutputStream.write(PipedOutputStream.java:154)
at
org.apache.cxf.transport.http.HttpClientHTTPConduit$HttpClientPipedOutputStream.write(HttpClientHTTPConduit.java:554)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
at
org.apache.cxf.metrics.interceptors.CountingOutputStream.write(CountingOutputStream.java:37)
at
org.apache.cxf.io.CacheAndWriteOutputStream.write(CacheAndWriteOutputStream.java:81)
at
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
at com.ctc.wstx.io.UTF8Writer.write(UTF8Writer.java:143)
at
com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1417)
at
com.ctc.wstx.sw.BufferingXmlWriter.writeAttrValue(BufferingXmlWriter.java:1155)
at
com.ctc.wstx.sw.BufferingXmlWriter.writeAttribute(BufferingXmlWriter.java:1051)
at
com.ctc.wstx.sw.BaseNsStreamWriter.doWriteNamespace(BaseNsStreamWriter.java:572)
at
com.ctc.wstx.sw.SimpleNsStreamWriter.writeNamespace(SimpleNsStreamWriter.java:141)
at
org.apache.cxf.staxutils.StaxUtils.writeStartElement(StaxUtils.java:835)
at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:741)
at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:707)
at
org.apache.cxf.binding.soap.saaj.SAAJOutInterceptor$SAAJOutEndingInterceptor.handleMessage(SAAJOutInterceptor.java:213)
{code}
The {{PipedInputStream}} looks like this (so it is connected, but doesn't yet
have a thread registered as the {{readSide}}, and never will have one. It
therefore doesn't consider the read end to be gone/dead and keeps looping
forever in {{awaitSpace()}}):
!image-2025-03-05-16-43-34-726.png!
I can reproduce this issue every time by
* Placing a breakpoint in this line:
https://github.com/apache/cxf/blob/7fb95ad266e4a5ced561a0dc56c038db43967ca4/rt/transports/http/src/main/java/org/apache/cxf/transport/http/HttpClientHTTPConduit.java#L637
* sending a request with a body that is larger than the chunking threshold
(4096 bytes by default), and larger than the chunk length,
* waiting for the breakpoint to be hit,
* then waiting for the connect timeout to be exceeded (30s by default),
* then resuming the program.
I recommend running with
{{-Djdk.httpclient.HttpClient.log=errors,requests,headers,frames:control:data:window,ssl,trace,channel}}.
That way we can see debug logs printed by the {{HttpClient}} that tell us when
timeouts happen and subscriptions are being cancelled.
As a reproducer project, you can use the [wsdl_first_dynamic_client
sample|https://github.com/apache/cxf/tree/cxf-4.1.0/distribution/src/main/release/samples/wsdl_first_dynamic_client],
with the following modification in the client to cause chunking to happen:
{code}
Index:
distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
---
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
(revision 7fb95ad266e4a5ced561a0dc56c038db43967ca4)
+++
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
(date 1741191623466)
@@ -35,6 +35,7 @@
import org.apache.cxf.service.model.BindingOperationInfo;
import org.apache.cxf.service.model.MessagePartInfo;
import org.apache.cxf.service.model.ServiceInfo;
+import org.apache.cxf.transport.http.HTTPConduit;
/**
*
@@ -70,6 +71,12 @@
JaxWsDynamicClientFactory factory =
JaxWsDynamicClientFactory.newInstance();
Client client = factory.createClient(wsdlURL.toExternalForm(),
SERVICE_NAME);
ClientImpl clientImpl = (ClientImpl) client;
+ ((HTTPConduit)
clientImpl.getConduit()).getClient().setChunkingThreshold(8);
+ ((HTTPConduit) clientImpl.getConduit()).getClient().setChunkLength(8);
+ ((HTTPConduit)
clientImpl.getConduit()).getClient().setConnectionTimeout(5000);
+ ((HTTPConduit)
clientImpl.getConduit()).getClient().setReceiveTimeout(5000);
+
+
Endpoint endpoint = clientImpl.getEndpoint();
ServiceInfo serviceInfo =
endpoint.getService().getServiceInfos().get(0);
QName bindingName = new QName("http://Company.com/Application",
{code}
Start the server with {{mvn -Pserver}}, set the breakpoint as described above
and start {{mvn -Pclient}} in the debugger. Once the breakpoint is hit, wait ~5
seconds and resume. The process will now hang forever.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)