Mark Mc Keown created HDFS-16825:
------------------------------------
Summary: hadoop-azure flush timing out and triggering retry
Key: HDFS-16825
URL: https://issues.apache.org/jira/browse/HDFS-16825
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Mark Mc Keown
>From AbfsHttpOperation the code to create a HTTP connection to Azure is:
{code}
public AbfsHttpOperation(final URL url, final String method, final
List<AbfsHttpHeader> requestHeaders)
throws IOException {
this.isTraceEnabled = LOG.isTraceEnabled();
this.url = url;
this.method = method;
this.clientRequestId = UUID.randomUUID().toString();
this.connection = openConnection();
if (this.connection instanceof HttpsURLConnection) {
HttpsURLConnection secureConn = (HttpsURLConnection) this.connection;
SSLSocketFactory sslSocketFactory =
SSLSocketFactoryEx.getDefaultFactory();
if (sslSocketFactory != null) {
secureConn.setSSLSocketFactory(sslSocketFactory);
}
}
this.connection.setConnectTimeout(CONNECT_TIMEOUT);
this.connection.setReadTimeout(READ_TIMEOUT);
this.connection.setRequestMethod(method);
for (AbfsHttpHeader header : requestHeaders) {
this.connection.setRequestProperty(header.getName(), header.getValue());
}
this.connection.setRequestProperty(HttpHeaderConfigurations.X_MS_CLIENT_REQUEST_ID,
clientRequestId);
}
{code}
The READ_TIMEOUT is hard coded to 30 seconds. When a file uploaded to Azure and
closed it triggers a flush operation - Azure sometimes takes longer than 30
seconds to respond and this is triggering a retry within hadoop-azure library.
(This can cause issues with DataBricks Autoloader which monitors EventGrid for
tiggers to ingest data - multiple flush/close can confuse it, this is an
Autoloader bug as retries can happen normally).
Can the READ_TIMEOUT be increased or made configurable?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]