[
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804148#comment-17804148
]
ASF GitHub Bot commented on HADOOP-18883:
-----------------------------------------
saxenapranav commented on code in PR #6022:
URL: https://github.com/apache/hadoop/pull/6022#discussion_r1444240890
##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsHttpOperation.java:
##########
@@ -340,8 +344,11 @@ public void sendRequest(byte[] buffer, int offset, int
length) throws IOExceptio
If expect header is not enabled, we throw back the exception.
*/
String expectHeader = getConnProperty(EXPECT);
- if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE)) {
+ if (expectHeader != null && expectHeader.equals(HUNDRED_CONTINUE)
+ && e instanceof ProtocolException
+ && EXPECT_100_JDK_ERROR.equals(e.getMessage())) {
Review Comment:
At `httpUrlConnection.getOutputStream`, either the error could
IOException(including ConnectionTimeout and ReadTimeout) or expect-100 error
(this raises ProtocolException which is child of IOException). Server errors if
any would be caught in `processResponse` and the treatment would be same as
done with all other apis (analyse if needed to be retried and then
RestOperation would retry it).
In the JDK's implementation of `getOutputStream`, For the IOExceptions, the
connection is killed. So, if further APIs are let go ahead, they would be
firing a new server call all together. So, other APIs, like getHeaderField()
etc, would be returning the data as per the new server call which is
undesirable.
Also, the implementation of `httpUrlConnection` is such that the other APIs
(like getHeaderField()), would internally call getInputStream(), which would
would first call getOutputStream() (if the sendData flag is true and doesnt
hold strOutputStream object). Now, here two things can happen:
1. Expect100 failure: no data capture, and again any next API on the
httpUrlConnection would fire a new call.
2. Status-100 : Now, it is not in the block where data can be put in the
outputStream, the stream shall be closed which will raise IOException, and from
here it will go back to retry loop. Ref:
https://github.com/openjdk/jdk8/blob/master/jdk/src/share/classes/sun/net/www/protocol/http/HttpURLConnection.java#L1463-L1471
Hence, any further API is prevented on the HttpUrlConnection object which
has got an IOException in getOutputStream.
> Expect-100 JDK bug resolution: prevent multiple server calls
> ------------------------------------------------------------
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Reporter: Pranav Saxena
> Assignee: Pranav Saxena
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>
> With the current implementation of HttpURLConnection if server rejects the
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance
> (ex getHeaderField(), or getHeaderFields()). They will internally call
> getOuputStream() which invokes writeRequests(), which make the actual server
> call.
> In the AbfsHttpOperation, after sendRequest() we call processResponse()
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due
> to expect-100 error, we consume the exception and let the code go ahead. So,
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which
> will be triggered after getOutputStream is failed. These invocation will lead
> to server calls.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]