[ https://issues.apache.org/jira/browse/HDFS-17599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872478#comment-17872478 ]
ASF GitHub Bot commented on HDFS-17599: --------------------------------------- hadoop-yetus commented on PR #6980: URL: https://github.com/apache/hadoop/pull/6980#issuecomment-2278818081 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 5m 56s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 36s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed | | +1 :green_heart: | checkstyle | 1m 25s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 42s | | trunk passed | | +1 :green_heart: | javadoc | 2m 4s | | trunk passed | | +1 :green_heart: | spotbugs | 3m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 38m 4s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 15s | | the patch passed | | +1 :green_heart: | javac | 1m 15s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 9s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 28s | | the patch passed | | +1 :green_heart: | javadoc | 1m 50s | | the patch passed | | +1 :green_heart: | spotbugs | 4m 8s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 5s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | +1 :green_heart: | unit | 273m 46s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 407m 49s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6980/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6980 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux b38c31bc2485 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4bc9c717cc8927a4cd45739cb85abe4f5ed66da2 | | Default Java | Red Hat, Inc.-1.8.0_312-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6980/4/testReport/ | | Max. process+thread count | 3140 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6980/4/console | | versions | git=2.27.0 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > EC: Fix the mismatch between locations and indices for mover > ------------------------------------------------------------ > > Key: HDFS-17599 > URL: https://issues.apache.org/jira/browse/HDFS-17599 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.3.0, 3.4.0 > Reporter: Tao Li > Assignee: Tao Li > Priority: Major > Labels: pull-request-available > Attachments: image-2024-08-03-17-59-08-059.png, > image-2024-08-03-18-00-01-950.png > > > We set the EC policy to (6+3) and also have nodes that were in state > ENTERING_MAINTENANCE. > > When we move the data of some directories from SSD to HDD, some blocks move > fail due to disk full, as shown in the figure below > (blk_-9223372033441574269). > We tried to move again and found the following error "{color:#ff0000}Replica > does not exist{color}". > Observing the information of fsck, it can be found that the wrong > blockid(blk_-9223372033441574270) was found when moving block. > > {*}Mover Logs{*}: > !image-2024-08-03-17-59-08-059.png|width=741,height=85! > > {*}FSCK Info{*}: > !image-2024-08-03-18-00-01-950.png|width=738,height=120! > > {*}Root Cause{*}: > Similar to this HDFS-16333, when mover is initialized, only the `LIVE` node > is processed. As a result, the datanode in the `ENTERING_MAINTENANCE` state > in the locations is filtered when initializing `DBlockStriped`, but the > indices are not adapted, resulting in a mismatch between the location and > indices lengths. Finally, ec block calculates the wrong blockid when getting > internal block (see `DBlockStriped#getInternalBlock`). > > We added debug logs, and a few key messages are shown below. > {color:#ff0000}The result is an incorrect correspondence: xx.xx.7.31 -> > -9223372033441574270{color}. > {code:java} > DBlock getInternalBlock(StorageGroup storage) { > // storage == xx.xx.7.31 > // idxInLocs == 1 (location ([xx.xx.,85.29:DISK, xx.xx.7.31:DISK, > xx.xx.207.22:DISK, xx.xx.8.25:DISK, xx.xx.79.30:DISK, xx.xx.87.21:DISK, > xx.xx.8.38:DISK]), xx.xx.179.31 is in the ENTERING_MAINTENANCE state is > filtered) > int idxInLocs = locations.indexOf(storage); > if (idxInLocs == -1) { > return null; > } > // idxInGroup == 2 (indices is [1,2,3,4,5,6,7,8]) > byte idxInGroup = indices[idxInLocs]; > // blkId: -9223372033441574272 + 2 = -9223372033441574270 > long blkId = getBlock().getBlockId() + idxInGroup; > long numBytes = getInternalBlockLength(getNumBytes(), cellSize, > dataBlockNum, idxInGroup); > Block blk = new Block(getBlock()); > blk.setBlockId(blkId); > blk.setNumBytes(numBytes); > DBlock dblk = new DBlock(blk); > dblk.addLocation(storage); > return dblk; > } {code} > {*}Solution{*}: > When initializing DBlockStriped, if any location is filtered out, we need to > remove the corresponding element in the indices to do the adaptation. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org