snvijaya commented on a change in pull request #3381:
URL: https://github.com/apache/hadoop/pull/3381#discussion_r707319027



##########
File path: 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsStatistic.java
##########
@@ -75,6 +75,8 @@
       "Total bytes uploaded."),
   BYTES_RECEIVED("bytes_received",
       "Total bytes received."),
+  BYTES_DISCARDED_AT_SOCKET_READ("bytes_discarded_at_socket_read",

Review comment:
       The bytesDiscarded is incremented when server happens to return any 
bytes that the client wasnt expecting to receive. 
   
   As of today, there are only 2 APIs that the server will return response 
body, which is List and Read. In case of List, inputStream is provided to the 
ObjectMapper for json conversion. This leaves just the read API where data 
intended to be read should match with the space in buffer to store data 
received.
   
   Ideally there are no scenarios in driver-server communication that this is 
expected. I couldnt find any clue that lead to the code that drains the socket 
either, but saw few forums mention about the side effects of client 
disconnecting while server might still be transmitting. TCP Reset gets 
triggered and signals an error in connection which in turn triggers some error 
handling and network layer buffers being reset. 
   
   In the case of read flow, AbfsHttpOperation layer has no access to 
AbfsInputStream instance and hence cant access the stream statistics it holds 
to. While logically read is the only possible API that can hit this case, this 
code is in a general Http response handling code, hence I retained the new 
statistic outside of StreamStatistics to track this.
   
   I looked at StoreStatisticNames, and it didnt look right to add a new 
statistic in there, hence adding this along with the other network statistics 
such as BYTES_SEND and BYTES_RECEIVED defined in AbfsStatistic enum. 
   
   Please let me know if this looks ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to