Nagy Attila Bálint created FLINK-39761:
------------------------------------------
Summary: Missing 'Connection: close' header on '304 Not Modified'
responses causes proxy connection pool poisoning
Key: FLINK-39761
URL: https://issues.apache.org/jira/browse/FLINK-39761
Project: Flink
Issue Type: Bug
Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 2.2.1, 1.20.4
Reporter: Nagy Attila Bálint
*Overview:*
When the Flink Web UI / History Server serves static files (e.g., .css, .js)
and receives an If-Modified-Since request matching the file's modification
time, it correctly generates a {{304 Not Modified}} response.
However, the server immediately drops the TCP connection without including a
{{Connection: close}} HTTP header in the response.
This violates HTTP/1.1 keep-alive
[expectations|https://www.rfc-editor.org/info/rfc2068/#section-19.7.1] and
causes *connection pool poisoning* in downstream reverse proxies (such as
Apache Knox or Nginx).
*Impact:*
Because HTTP/1.1 assumes persistent connections by default, reverse proxies
receive the 304 response and place the connection back into their reusable
connection pool.
When the proxy attempts to reuse this connection for the very next request, it
hits an unexpected end of stream because Flink has already severed the TCP
connection.
In Apache Knox, this manifests as a NoHttpResponseException and results in
intermittent HTTP 500 Server Errors being served to the end user.
*Root Cause:*
The issue originates in
{{org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler}}.
current master branch (2.2.1+)
[link|https://github.com/apache/flink/blob/45295cf62608ca172b83ac42d9128d027a91d06a/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/legacy/files/StaticFileServerHandler.java#L312]
1.20.4 branch
[link|https://github.com/apache/flink/blob/release-1.20.4/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/legacy/files/StaticFileServerHandler.java#L312]
In the {{sendNotModified}} method, the code creates a
{{DefaultFullHttpResponse}} and immediately attaches a
{{ChannelFutureListener.CLOSE}} listener to the write operation.
However, it fails to set the {{Connection: close}} header on the response
object before flushing it to the client.
{code:java}
public static void sendNotModified(ChannelHandlerContext ctx) {
FullHttpResponse response = new DefaultFullHttpResponse(HTTP_1_1,
NOT_MODIFIED);
setDateHeader(response);
// BUG: Missing explicit Connection: close header here before closing the
channel.
// Proxies assume the connection is kept alive.
// close the connection as soon as the error message is sent.
ctx.writeAndFlush(response).addListener(ChannelFutureListener.CLOSE);
}
{code}
*Proposed Solution:*
To comply with HTTP/1.1 specifications and prevent proxy connection pool
poisoning, the Flink server must explicitly communicate that the connection is
being closed.
The fix is simply to add the Connection: close header before flushing the
response:
{code:java}
public static void sendNotModified(ChannelHandlerContext ctx) {
FullHttpResponse response = new DefaultFullHttpResponse(HTTP_1_1,
NOT_MODIFIED);
setDateHeader(response);
// Explicitly notify the client that the connection will be dropped
response.headers().set(HttpHeaderNames.CONNECTION, HttpHeaderValues.CLOSE);
// close the connection as soon as the error message is sent.
ctx.writeAndFlush(response).addListener(ChannelFutureListener.CLOSE);
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)