[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

ASF GitHub Bot (Jira) Wed, 15 Feb 2023 17:43:15 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689437#comment-17689437
 ]


ASF GitHub Bot commented on HDFS-16896:
---------------------------------------

mccormickt12 commented on code in PR #5322:
URL: https://github.com/apache/hadoop/pull/5322#discussion_r1107919545


##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java:
##########
@@ -1337,7 +1352,11 @@ private void hedgedFetchBlockByteRange(LocatedBlock 
block, long start,
         } catch (InterruptedException ie) {
           // Ignore and retry
         }
-        if (refetch) {
+        // if refetch is true then all nodes are in deadlist or ignorelist
+        // we should loop through all futures and remove them so we do not

Review Comment:
   fixed comments. deadlist is actually deadNodes (I fixed that comment as 
well.)
   When connections fail (in both hedged and non hedged code path) nodes are 
added to the deadNodes collection to try other nodes. Once `chooseDataNode` 
returns `null` (or more accurately `getBestNodeDNAddrPair`) it calls 
`refetchLocations` which clears the deadNodes `clearLocalDeadNodes()` and now 
with my change, also clears the ignore list. 
   
   Note we have added an assumption to this method `refetchLocations`. The 
comment I added to `refetchLocations`
   ``` 
    /**
      * RefetchLocations should only be called when there are no active requests
      * to datanodes. In the hedged read case this means futures should be empty
      */
      ```





> HDFS Client hedged read has increased failure rate than without hedged read
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-16896
>                 URL: https://issues.apache.org/jira/browse/HDFS-16896
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>            Reporter: Tom McCormick
>            Assignee: Tom McCormick
>            Priority: Major
>              Labels: pull-request-available
>
> When hedged read is enabled by HDFS client, we see an increased failure rate 
> on reads.
> *stacktrace*
>  
> {code:java}
> Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain 
> block: BP-1183972111-10.197.192.88-1590025572374:blk_17114848218_16043459722 
> file=/data/tracking/streaming/AdImpressionEvent/daily/2022/07/18/compaction_1/part-r-1914862.1658217125623.1362294472.orc
> at 
> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1077)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1060)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1039)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1365)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1572)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1535)
> at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at 
> org.apache.hadoop.fs.RetryingInputStream.lambda$readFully$3(RetryingInputStream.java:172)
> at org.apache.hadoop.fs.RetryPolicy.lambda$run$0(RetryPolicy.java:137)
> at org.apache.hadoop.fs.NoOpRetryPolicy.run(NoOpRetryPolicy.java:36)
> at org.apache.hadoop.fs.RetryPolicy.run(RetryPolicy.java:136)
> at 
> org.apache.hadoop.fs.RetryingInputStream.readFully(RetryingInputStream.java:168)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at 
> io.trino.plugin.hive.orc.HdfsOrcDataSource.readInternal(HdfsOrcDataSource.java:76)
> ... 46 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

Reply via email to