[ https://issues.apache.org/jira/browse/HDFS-17737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDFS-17737: ---------------------------------- Labels: pull-request-available (was: ) > Implement Backoff Retry for ErasureCoding reads > ----------------------------------------------- > > Key: HDFS-17737 > URL: https://issues.apache.org/jira/browse/HDFS-17737 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, ec, erasure-coding > Affects Versions: 3.3.4 > Reporter: Danny Becker > Assignee: Danny Becker > Priority: Major > Labels: pull-request-available > > #Why > Currently EC Reads are less stable than replication reads because if 4 out of > 9 datanodes in the block group are busy, then the whole read fails. Erasure > Coding reads need to be able to handle ERROR_BUSY signals from DataNodes and > retry after a backoff duration to avoid overloading the DataNodes while > increasing the stability of the read. > Throttling on server side was another proposed solution, but we prefer this > client side backoff for a few main reasons (see > https://msasg.visualstudio.com/DefaultCollection/Multi%20Tenancy/_git/Hadoop/pullRequest/5272897#1739008224): > 1. Throttling on the server would use up thread connections which have a > maximum limit. > 2. Throttling was originally added only for cohosting scenario to reduce > impact on other services > 3. Throttling would use up resources on the DataNode which is already busy. > #What > The previous implementation followed a 4 phase algorithm to read. > 1. Attempt to read chunks from the data blocks > 2. Check for missing data chunks. Fail if there are more missing than the > number of parity blocks, otherwise read parity blocks and null data blocks > 3. Wait for data to be read into the buffers and handle any read errors by > reading from more parity blocks > 4. Check for missing blocks and either decode or fail. > The new implementation now merges phase 1-3 into a single loop: > 1. Loop until we have enough blocks for read or decode, or we have too many > missing blocks to succeed > - Determine the number of chunks we need to fetch. ALLZERO chunks count > towards this total. null data chunks also count towards this total unless > there are missing data chunks. > - Read chunks until we have enough pending or fetched to be able to decode > or normal read. > faster. > - Get results from reads and handle exceptions by preparing more reads for > decoding the missing data > - Check if we should sleep before retrying any reads. > 2. Check for missing blocks and either decode or fail. > #Tests > Add unit test to `TestWriteReadStripedFile` > - Covers RS(3,2) with 1 chunk busy, 2 chunks busy, and 3 chunks busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org