masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-706027170
The patch reproduced the issue. I saw `VC_EVENT_READ_READY` on the producer,
`VC_EVENT_WRITE_COMPLETE` on the consumer, and HttpSM got stalled.
1. Why inactive timeout is not fired?
It looks like the stream/transaction is "inactive", I expect inactive
timeout is fired and kill HttpSM.
2. I got a crash with this change and the h2spec patch (debug build on macOS)
```
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <Http2ConnectionState.cc:176
(rcv_data_frame)> (http2_con) [0] [1] Received DATA frame: rwnd con=61535/65535
stream=61535/65535
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1101
(producer_handler)> (http_tunnel) [0] producer_handler [user agent post
VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1145
(producer_handler)> (http_redirect) [HttpTunnel::producer_handler]
enable_redirection: [1 0 0] event: 100, state: 0
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1307
(consumer_handler)> (http_tunnel) [0] consumer_handler [http server post
VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:3611
(tunnel_handler_post_server)> (http) [0] [&HttpSM::tunnel_handler_post_server,
VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2636 (main_handler)>
(http) [0] [HttpSM::main_handler, HTTP_TUNNEL_EVENT_DONE]
[Oct 9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2737
(tunnel_handler_post)> (http) [0] [&HttpSM::tunnel_handler_post,
HTTP_TUNNEL_EVENT_DONE]
Fatal: HttpSM.cc:2714: failed assertion `p->read_success == true`
Process 58006 stopped
* thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff6a9a633a <+10>: jae 0x7fff6a9a6344 ; <+20>
0x7fff6a9a633c <+12>: movq %rax, %rdi
0x7fff6a9a633f <+15>: jmp 0x7fff6a9a0629 ; cerror_nocancel
0x7fff6a9a6344 <+20>: retq
(lldb) bt
* thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
* frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff6aa62e60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x00007fff6a92d808 libsystem_c.dylib`abort + 120
frame #3: 0x000000000006a04e
libtscore.10.dylib`ink_abort(message_format="%s:%d: failed assertion `%s`") at
ink_error.cc:99:3
frame #4: 0x000000000006716c
libtscore.10.dylib`::_ink_assert(expression="p->read_success == true",
file="HttpSM.cc", line=2714) at ink_assert.cc:37:3
frame #5: 0x00000001000e544e
traffic_server`HttpSM::tunnel_handler_post_or_put(this=0x0000000007e9d3d0,
p=0x0000000007e9e540) at HttpSM.cc:2714:5
frame #6: 0x00000001000d2a3f
traffic_server`HttpSM::tunnel_handler_post(this=0x0000000007e9d3d0, event=2301,
data=0x0000000007e9e330) at HttpSM.cc:2791:5
frame #7: 0x00000001000cd94a
traffic_server`HttpSM::main_handler(this=0x0000000007e9d3d0, event=2301,
data=0x0000000007e9e330) at HttpSM.cc:2656:5
frame #8: 0x00000001000054bf
traffic_server`Continuation::handleEvent(this=0x0000000007e9d3d0, event=2301,
data=0x0000000007e9e330) at I_Continuation.h:219:12
frame #9: 0x00000001001478af
traffic_server`HttpTunnel::main_handler(this=0x0000000007e9e330, event=103,
data=0x0000000007f5ce50) at HttpTunnel.cc:1629:11
frame #10: 0x00000001000054bf
traffic_server`Continuation::handleEvent(this=0x0000000007e9e330, event=103,
data=0x0000000007f5ce50) at I_Continuation.h:219:12
frame #11: 0x00000001003c5003
traffic_server`write_signal_and_update(event=103, vc=0x0000000007f5cbf0) at
UnixNetVConnection.cc:115:25
frame #12: 0x00000001003c527d
traffic_server`write_signal_done(event=103, nh=0x0000000002ee0080,
vc=0x0000000007f5cbf0) at UnixNetVConnection.cc:158:7
frame #13: 0x00000001003c4a05
traffic_server`write_to_net_io(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0,
thread=0x0000000002edc000) at UnixNetVConnection.cc:500:7
frame #14: 0x00000001003c3d73
traffic_server`write_to_net(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0,
thread=0x0000000002edc000) at UnixNetVConnection.cc:344:3
frame #15: 0x00000001003c8385
traffic_server`UnixNetVConnection::net_write_io(this=0x0000000007f5cbf0,
nh=0x0000000002ee0080, lthread=0x0000000002edc000) at
UnixNetVConnection.cc:899:3
frame #16: 0x00000001003b77ea
traffic_server`NetHandler::process_ready_list(this=0x0000000002ee0080) at
UnixNet.cc:429:11
frame #17: 0x00000001003b8156
traffic_server`NetHandler::waitForActivity(this=0x0000000002ee0080,
timeout=19997000) at UnixNet.cc:546:3
frame #18: 0x00000001003f8b21
traffic_server`EThread::execute_regular(this=0x0000000002edc000) at
UnixEThread.cc:266:14
frame #19: 0x00000001003f8f9d
traffic_server`EThread::execute(this=0x0000000002edc000) at
UnixEThread.cc:327:11
frame #20: 0x00000001003f7327
traffic_server`spawn_thread_internal(a=0x0000000000a20060) at Thread.cc:92:12
frame #21: 0x00007fff6aa63109 libsystem_pthread.dylib`_pthread_start +
148
frame #22: 0x00007fff6aa5eb8b libsystem_pthread.dylib`thread_start + 15
```
3. This looks assuming that clients send the "Content-Length" header.
When clients don't send the "Content-Length" header nor "END_STREAM" flag,
the consumer can't get `WRITE_COMPLETE`.
I saw HttpSM is stalled in this case too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]