[jira] [Work logged] (HDFS-16272) Int overflow in computing safe length during EC block recovery

ASF GitHub Bot (Jira) Wed, 13 Oct 2021 13:17:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16272?focusedWorklogId=665347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-665347
 ]


ASF GitHub Bot logged work on HDFS-16272:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Oct/21 20:16
            Start Date: 13/Oct/21 20:16
    Worklog Time Spent: 10m 
      Work Description: sodonnel commented on a change in pull request #3548:
URL: https://github.com/apache/hadoop/pull/3548#discussion_r728414217



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java
##########
@@ -245,8 +245,7 @@ public static long getSafeLength(ErasureCodingPolicy 
ecPolicy,
     Arrays.sort(cpy);
     // full stripe is a stripe has at least dataBlkNum full cells.
     // lastFullStripeIdx is the index of the last full stripe.
-    int lastFullStripeIdx =
-        (int) (cpy[cpy.length - dataBlkNum] / cellSize);
+    long lastFullStripeIdx = cpy[cpy.length - dataBlkNum] / cellSize;

Review comment:
       I know this is existing code, but I'd like to understand what is 
happening here to review this.
   
   This method receives an array of internal block lengths, so for 3-2 it will 
have 5 entries, 6-3 it will have 9 etc.
   
   Then it sorts the lengths smallest to largest. Then it selects the one at 
position num_blocks - numDataUnits.
   
   Why does it not just pick the first one, which would be the smallest, as the 
smallest data block in the group indicates the last full stripe.
   
   Why is the safe length based on the full stripe, and not a potentially 
partial last stripe?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 665347)
    Time Spent: 50m  (was: 40m)

> Int overflow in computing safe length during EC block recovery
> --------------------------------------------------------------
>
>                 Key: HDFS-16272
>                 URL: https://issues.apache.org/jira/browse/HDFS-16272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: 3.1.1
>    Affects Versions: 3.3.0, 3.3.1
>         Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>            Reporter: daimin
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16272) Int overflow in computing safe length during EC block recovery

Reply via email to