Kai Zander created CXF-9115:
-------------------------------

             Summary: Race Condition in HttpClientHttpConduit Causes Writing 
Thread to Hang Forever
                 Key: CXF-9115
                 URL: https://issues.apache.org/jira/browse/CXF-9115
             Project: CXF
          Issue Type: Bug
          Components: Transports
    Affects Versions: 4.0.6, 4.1.0
            Reporter: Kai Zander
         Attachments: image-2025-03-05-16-43-34-726.png

It is possible for {{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} 
to be called _after_ the underlying subscription has already been cancelled, 
for example, if a connect timeout happens _before_ 
{{HttpClientHTTPConduit.HttpClientBodyPublisher#subscribe}} is called.
In this case, the writing thread will be stuck in 
{{HttpClientHTTPConduit.HttpClientPipedOutputStream#write}}, waiting forever 
for space in the write buffer.

This happens every once in a while on our production system, causing it to 
hang. The threads are stuck here:
{code}
"main@1" tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait0(Object.java:-1)
          at java.lang.Object.wait(Object.java:366)
          at java.io.PipedInputStream.awaitSpace(PipedInputStream.java:279)
          at java.io.PipedInputStream.receive(PipedInputStream.java:237)
          at java.io.PipedOutputStream.write(PipedOutputStream.java:154)
          at 
org.apache.cxf.transport.http.HttpClientHTTPConduit$HttpClientPipedOutputStream.write(HttpClientHTTPConduit.java:554)
          at 
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
          at 
org.apache.cxf.io.AbstractThresholdOutputStream.write(AbstractThresholdOutputStream.java:69)
          at 
org.apache.cxf.metrics.interceptors.CountingOutputStream.write(CountingOutputStream.java:37)
          at 
org.apache.cxf.io.CacheAndWriteOutputStream.write(CacheAndWriteOutputStream.java:81)
          at 
org.apache.cxf.io.AbstractWrappedOutputStream.write(AbstractWrappedOutputStream.java:51)
          at com.ctc.wstx.io.UTF8Writer.write(UTF8Writer.java:143)
          at 
com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1417)
          at 
com.ctc.wstx.sw.BufferingXmlWriter.writeAttrValue(BufferingXmlWriter.java:1155)
          at 
com.ctc.wstx.sw.BufferingXmlWriter.writeAttribute(BufferingXmlWriter.java:1051)
          at 
com.ctc.wstx.sw.BaseNsStreamWriter.doWriteNamespace(BaseNsStreamWriter.java:572)
          at 
com.ctc.wstx.sw.SimpleNsStreamWriter.writeNamespace(SimpleNsStreamWriter.java:141)
          at 
org.apache.cxf.staxutils.StaxUtils.writeStartElement(StaxUtils.java:835)
          at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:741)
          at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:707)
          at 
org.apache.cxf.binding.soap.saaj.SAAJOutInterceptor$SAAJOutEndingInterceptor.handleMessage(SAAJOutInterceptor.java:213)
{code}
The {{PipedInputStream}} looks like this (so it is connected, but doesn't yet 
have a thread registered as the {{readSide}}, and never will have one. It 
therefore doesn't consider the read end to be gone/dead and keeps looping 
forever in {{awaitSpace()}}):
!image-2025-03-05-16-43-34-726.png!

I can reproduce this issue every time by
* Placing a breakpoint in this line: 
https://github.com/apache/cxf/blob/7fb95ad266e4a5ced561a0dc56c038db43967ca4/rt/transports/http/src/main/java/org/apache/cxf/transport/http/HttpClientHTTPConduit.java#L637
* sending a request with a body that is larger than the chunking threshold 
(4096 bytes by default), and larger than the chunk length,
* waiting for the breakpoint to be hit,
* then waiting for the connect timeout to be exceeded (30s by default),
* then resuming the program.

I recommend running with 
{{-Djdk.httpclient.HttpClient.log=errors,requests,headers,frames:control:data:window,ssl,trace,channel}}.
 That way we can see debug logs printed by the {{HttpClient}} that tell us when 
timeouts happen and subscriptions are being cancelled.

As a reproducer project, you can use the [wsdl_first_dynamic_client 
sample|https://github.com/apache/cxf/tree/cxf-4.1.0/distribution/src/main/release/samples/wsdl_first_dynamic_client],
 with the following modification in the client to cause chunking to happen:
{code}
Index: 
distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
--- 
a/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (revision 7fb95ad266e4a5ced561a0dc56c038db43967ca4)
+++ 
b/distribution/src/main/release/samples/wsdl_first_dynamic_client/src/main/java/demo/hw/client/ComplexClient.java
   (date 1741191623466)
@@ -35,6 +35,7 @@
 import org.apache.cxf.service.model.BindingOperationInfo;
 import org.apache.cxf.service.model.MessagePartInfo;
 import org.apache.cxf.service.model.ServiceInfo;
+import org.apache.cxf.transport.http.HTTPConduit;
 
 /**
  *
@@ -70,6 +71,12 @@
         JaxWsDynamicClientFactory factory = 
JaxWsDynamicClientFactory.newInstance();
         Client client = factory.createClient(wsdlURL.toExternalForm(), 
SERVICE_NAME);
         ClientImpl clientImpl = (ClientImpl) client;
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setChunkingThreshold(8);
+        ((HTTPConduit) clientImpl.getConduit()).getClient().setChunkLength(8);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setConnectionTimeout(5000);
+        ((HTTPConduit) 
clientImpl.getConduit()).getClient().setReceiveTimeout(5000);
+
+
         Endpoint endpoint = clientImpl.getEndpoint();
         ServiceInfo serviceInfo = 
endpoint.getService().getServiceInfos().get(0);
         QName bindingName = new QName("http://Company.com/Application";,
{code}
Start the server with {{mvn -Pserver}}, set the breakpoint as described above 
and start {{mvn -Pclient}} in the debugger. Once the breakpoint is hit, wait ~5 
seconds and resume. The process will now hang forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to