[ 
https://issues.apache.org/jira/browse/HDFS-17737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930847#comment-17930847
 ] 

ASF GitHub Bot commented on HDFS-17737:
---------------------------------------

hadoop-yetus commented on PR #7427:
URL: https://github.com/apache/hadoop/pull/7427#issuecomment-2686189332

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   7m 23s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   5m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 110 unchanged - 4 fixed = 110 total (was 114)  |
   | +1 :green_heart: |  mvnsite  |   2m  1s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 38s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7427/4/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.txt)
 |  hadoop-hdfs-client in the patch failed with JDK 
Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04.  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   5m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 31s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 164m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.48 ServerAPI=1.48 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7427/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/7427 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle |
   | uname | Linux dccdc7ce188b 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 
21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e8339b09243598378bcc3611d8057c0e15069214 |
   | Default Java | Private Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_442-8u442-b06~us1-0ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7427/4/testReport/ |
   | Max. process+thread count | 700 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7427/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Implement Backoff Retry for ErasureCoding reads
> -----------------------------------------------
>
>                 Key: HDFS-17737
>                 URL: https://issues.apache.org/jira/browse/HDFS-17737
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient, ec, erasure-coding
>    Affects Versions: 3.3.4
>            Reporter: Danny Becker
>            Assignee: Danny Becker
>            Priority: Major
>              Labels: pull-request-available
>
> #Why
> Currently EC Reads are less stable than replication reads because if 4 out of 
> 9 datanodes in the block group are unable to connect, then the whole read 
> fails. Erasure Coding reads need to be able to handle connection failures 
> from DataNodes and retry after a backoff duration to avoid overloading the 
> DataNodes while increasing the stability of the read.
> Throttling on server side was another proposed solution, but we prefer this 
> client side backoff for a few main reasons:
> 1. Throttling on the server would use up thread connections which have a 
> maximum limit.
> 2. Throttling was originally added only for cohosting scenario to reduce 
> impact on other services
> 3. Throttling would use up resources on the DataNode which could already be 
> in a bad state.
> #What
> The previous implementation followed a 4 phase algorithm to read.
> 1. Attempt to read chunks from the data blocks
> 2. Check for missing data chunks. Fail if there are more missing than the 
> number of parity blocks, otherwise read parity blocks and null data blocks
> 3. Wait for data to be read into the buffers and handle any read errors by 
> reading from more parity blocks
> 4. Check for missing blocks and either decode or fail.
> The new implementation now merges phase 1-3 into a single loop:
> 1. Loop until we have enough blocks for read or decode, or we have too many 
> missing blocks to succeed
>    - Determine the number of chunks we need to fetch. ALLZERO chunks count 
> towards this total. null data chunks also count towards this total unless 
> there are MISSING or SLEEPING data chunks.
>    - Read chunks until we have enough pending or fetched to be able to decode 
> or normal read.
> faster.
>    - Get results from reads and handle exceptions by preparing more reads for 
> decoding the missing data
>    - Check if we should sleep before retrying any reads.
> 2. Check for missing blocks and either decode or fail.
> #Tests
> Add unit test to `TestWriteReadStripedFile`
> - Covers RS(3,2) with 1 chunk busy, 2 chunks busy, and 3 chunks busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to