masaori335 commented on pull request #7237:
URL: https://github.com/apache/trafficserver/pull/7237#issuecomment-706027170


   The patch reproduced the issue. I saw `VC_EVENT_READ_READY` on the producer, 
`VC_EVENT_WRITE_COMPLETE` on the consumer, and HttpSM got stalled.
   
   1. Why inactive timeout is not fired? 
   It looks like the stream/transaction is "inactive", I expect inactive 
timeout is fired and kill HttpSM.
   
   2. I got a crash with this change and the h2spec patch (debug build on macOS)
   
   ```
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <Http2ConnectionState.cc:176 
(rcv_data_frame)> (http2_con) [0] [1] Received DATA frame: rwnd con=61535/65535 
stream=61535/65535
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1101 
(producer_handler)> (http_tunnel) [0] producer_handler [user agent post 
VC_EVENT_READ_READY/TS_EVENT_VCONN_READ_READY]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1145 
(producer_handler)> (http_redirect) [HttpTunnel::producer_handler] 
enable_redirection: [1 0 0] event: 100, state: 0
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpTunnel.cc:1307 
(consumer_handler)> (http_tunnel) [0] consumer_handler [http server post 
VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:3611 
(tunnel_handler_post_server)> (http) [0] [&HttpSM::tunnel_handler_post_server, 
VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2636 (main_handler)> 
(http) [0] [HttpSM::main_handler, HTTP_TUNNEL_EVENT_DONE]
   [Oct  9 16:29:30.429] [ET_NET 0] DEBUG: <HttpSM.cc:2737 
(tunnel_handler_post)> (http) [0] [&HttpSM::tunnel_handler_post, 
HTTP_TUNNEL_EVENT_DONE]
   Fatal: HttpSM.cc:2714: failed assertion `p->read_success == true`
   Process 58006 stopped
   * thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
       frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
   libsystem_kernel.dylib`__pthread_kill:
   ->  0x7fff6a9a633a <+10>: jae    0x7fff6a9a6344            ; <+20>
       0x7fff6a9a633c <+12>: movq   %rax, %rdi
       0x7fff6a9a633f <+15>: jmp    0x7fff6a9a0629            ; cerror_nocancel
       0x7fff6a9a6344 <+20>: retq
   (lldb) bt
   * thread #2, name = '[ET_NET 0]', stop reason = signal SIGABRT
     * frame #0: 0x00007fff6a9a633a libsystem_kernel.dylib`__pthread_kill + 10
       frame #1: 0x00007fff6aa62e60 libsystem_pthread.dylib`pthread_kill + 430
       frame #2: 0x00007fff6a92d808 libsystem_c.dylib`abort + 120
       frame #3: 0x000000000006a04e 
libtscore.10.dylib`ink_abort(message_format="%s:%d: failed assertion `%s`") at 
ink_error.cc:99:3
       frame #4: 0x000000000006716c 
libtscore.10.dylib`::_ink_assert(expression="p->read_success == true", 
file="HttpSM.cc", line=2714) at ink_assert.cc:37:3
       frame #5: 0x00000001000e544e 
traffic_server`HttpSM::tunnel_handler_post_or_put(this=0x0000000007e9d3d0, 
p=0x0000000007e9e540) at HttpSM.cc:2714:5
       frame #6: 0x00000001000d2a3f 
traffic_server`HttpSM::tunnel_handler_post(this=0x0000000007e9d3d0, event=2301, 
data=0x0000000007e9e330) at HttpSM.cc:2791:5
       frame #7: 0x00000001000cd94a 
traffic_server`HttpSM::main_handler(this=0x0000000007e9d3d0, event=2301, 
data=0x0000000007e9e330) at HttpSM.cc:2656:5
       frame #8: 0x00000001000054bf 
traffic_server`Continuation::handleEvent(this=0x0000000007e9d3d0, event=2301, 
data=0x0000000007e9e330) at I_Continuation.h:219:12
       frame #9: 0x00000001001478af 
traffic_server`HttpTunnel::main_handler(this=0x0000000007e9e330, event=103, 
data=0x0000000007f5ce50) at HttpTunnel.cc:1629:11
       frame #10: 0x00000001000054bf 
traffic_server`Continuation::handleEvent(this=0x0000000007e9e330, event=103, 
data=0x0000000007f5ce50) at I_Continuation.h:219:12
       frame #11: 0x00000001003c5003 
traffic_server`write_signal_and_update(event=103, vc=0x0000000007f5cbf0) at 
UnixNetVConnection.cc:115:25
       frame #12: 0x00000001003c527d 
traffic_server`write_signal_done(event=103, nh=0x0000000002ee0080, 
vc=0x0000000007f5cbf0) at UnixNetVConnection.cc:158:7
       frame #13: 0x00000001003c4a05 
traffic_server`write_to_net_io(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0, 
thread=0x0000000002edc000) at UnixNetVConnection.cc:500:7
       frame #14: 0x00000001003c3d73 
traffic_server`write_to_net(nh=0x0000000002ee0080, vc=0x0000000007f5cbf0, 
thread=0x0000000002edc000) at UnixNetVConnection.cc:344:3
       frame #15: 0x00000001003c8385 
traffic_server`UnixNetVConnection::net_write_io(this=0x0000000007f5cbf0, 
nh=0x0000000002ee0080, lthread=0x0000000002edc000) at 
UnixNetVConnection.cc:899:3
       frame #16: 0x00000001003b77ea 
traffic_server`NetHandler::process_ready_list(this=0x0000000002ee0080) at 
UnixNet.cc:429:11
       frame #17: 0x00000001003b8156 
traffic_server`NetHandler::waitForActivity(this=0x0000000002ee0080, 
timeout=19997000) at UnixNet.cc:546:3
       frame #18: 0x00000001003f8b21 
traffic_server`EThread::execute_regular(this=0x0000000002edc000) at 
UnixEThread.cc:266:14
       frame #19: 0x00000001003f8f9d 
traffic_server`EThread::execute(this=0x0000000002edc000) at 
UnixEThread.cc:327:11
       frame #20: 0x00000001003f7327 
traffic_server`spawn_thread_internal(a=0x0000000000a20060) at Thread.cc:92:12
       frame #21: 0x00007fff6aa63109 libsystem_pthread.dylib`_pthread_start + 
148
       frame #22: 0x00007fff6aa5eb8b libsystem_pthread.dylib`thread_start + 15
   ```
   
   3. This looks assuming that clients send the "Content-Length" header.
   When clients don't send the "Content-Length" header nor "END_STREAM" flag, 
the consumer can't get `WRITE_COMPLETE`.
   I saw HttpSM is stalled in this case too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to