Rajat Goel created KNOX-2003:
--------------------------------
Summary: HTTP connections between Knox gateway and Service backend
in CLOSE_WAIT, not getting closed and hang around seemingly forever
Key: KNOX-2003
URL: https://issues.apache.org/jira/browse/KNOX-2003
Project: Apache Knox
Issue Type: Bug
Components: Server
Affects Versions: 1.0.0
Environment: HDP 3.1.0
Reporter: Rajat Goel
Attachments: Screenshot 2019-08-27 at 10.49.05 PM.png,
gateway.out_close_wait_thread_dump
With UI sending multiple HTTP requests for different UI pages, TCP connections
(between Knox and Service backend) corresponding to some of these requests move
to CLOSE_WAIT state few seconds after the request is complete. These
connections are stuck in this state indefinitely until Knox server is
restarted. ‘lsof’ output for Knox server process shows that Knox is still
holding ‘fd’ for these connections. I took raw packet dump using
Tshark/Wireshark and following are the observations:
* UI sends GET request for Knox which is proxied correctly to backend.
* Backend sends 200 OK response with data to Knox which is proxied to Frontend.
* After few seconds, backend server sends a TCP [FIN] [ACK] packet to Knox for
connection closure, possible due to idle timeout expiring at backend service.
* Knox server send a TCP [ACK] => Connection moves to CLOSE_WAIT state and
thereafter no communication.
* In normal scenario for connections which are closed properly, Knox also
sends a [FIN] {ACK] TCP packet to backend and then backend responds with [ACK]
and connection is closed properly.
The issue here is Knox is not sending [FIN][ACK] to backend. This looks to be
an issue with Jetty server and might possibly related to
[https://github.com/eclipse/jetty.project/issues/2169].
Attaching Wireshark UI screenshot and Thread dump of Knox gateway server.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)