[
https://issues.apache.org/jira/browse/HADOOP-17890?focusedWorklogId=650004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-650004
]
ASF GitHub Bot logged work on HADOOP-17890:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 13/Sep/21 13:11
Start Date: 13/Sep/21 13:11
Worklog Time Spent: 10m
Work Description: snvijaya commented on a change in pull request #3381:
URL: https://github.com/apache/hadoop/pull/3381#discussion_r707319027
##########
File path:
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsStatistic.java
##########
@@ -75,6 +75,8 @@
"Total bytes uploaded."),
BYTES_RECEIVED("bytes_received",
"Total bytes received."),
+ BYTES_DISCARDED_AT_SOCKET_READ("bytes_discarded_at_socket_read",
Review comment:
The bytesDiscarded is incremented when server happens to return any
bytes that the client wasnt expecting to receive.
As of today, there are only 2 APIs that the server will return response
body, which is List and Read. In case of List, inputStream is provided to the
ObjectMapper for json conversion. This leaves just the read API where data
intended to be read should match with the space in buffer to store data
received.
Ideally there are no scenarios in driver-server communication that this is
expected. I couldnt find any clue that lead to the code that drains the socket
either, but saw few forums mention about the side effects of client
disconnecting while server might still be transmitting. TCP Reset gets
triggered and signals an error in connection which in turn triggers some error
handling and network layer buffers being reset.
In the case of read flow, AbfsHttpOperation layer has no access to
AbfsInputStream instance and hence cant access the stream statistics it holds
to. While logically read is the only possible API that can hit this case, this
code is in a general Http response handling code, hence I retained the new
statistic outside of StreamStatistics to track this.
I looked at StoreStatisticNames, and it didnt look right to add a new
statistic in there, hence adding this along with the other network statistics
such as BYTES_SEND and BYTES_RECEIVED defined in AbfsStatistic enum.
Please let me know if this looks ok.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 650004)
Time Spent: 0.5h (was: 20m)
> ABFS: Refactor HTTP request handling code
> -----------------------------------------
>
> Key: HADOOP-17890
> URL: https://issues.apache.org/jira/browse/HADOOP-17890
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.4.0
> Reporter: Sneha Vijayarajan
> Assignee: Sneha Vijayarajan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Aims at Http request handling code refactoring.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]