len548 opened a new pull request, #10592:
URL: https://github.com/apache/ozone/pull/10592

   ## What changes were proposed in this pull request?
   `ozone debug replicas chunk-info` reported the same blockData.size for every 
EC replica instead of the per-replica size from each datanode.
   
   <p><strong>Example steps to reproduce</strong></p>
   <ol>
   <li>Create an EC key with <tt>rs-3-2-1024k</tt> of size 1,148,576 bytes 
(between 1 MiB and 2 MiB).</li>
   <li>Run:<br />ozone debug replicas chunk-info 
&lt;volume&gt;/&lt;bucket&gt;/&lt;key&gt;</li>
   <li>Inspect <tt>blockData.size</tt> for each entry in 
<tt>keyLocations</tt>.</li>
   </ol>
   <p>Expected (EC 3+2, 1,148,576 bytes):</p>
   <div class="table-wrap">
   
   Replica | Expected size
   -- | --
   Data 1 | 1,048,576
   Data 2 | 100,000
   Data 3 | 0
   Parity 4, 5 | 1,048,576 each
   
   
   </div>
   <p>Actual: all replicas show `1,048,576.</p>`
   
   **Root cause**: After HDDS-13445, `ChunkKeyHandler` iterated over pipeline 
datanodes but called `ContainerProtocolCalls.getBlock()`, which uses 
`tryEachDatanode` and always queries the closest node unless errors reading the 
datanode. The loop variable of datanode was ignored by that. The current 
testing doesn't catch this bug because it doesn't check the size for each 
replicas.
   **Fix**: Add `ContainerProtocolCalls.getBlockFromDatanode()` and use it in 
`ChunkKeyHandler` so each replica is queried on its own datanode with the 
correct EC replica index.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15581
   
   ## How was this patch tested?
   Replace .block file-count checks in 
[ozone-debug-tests-ec3-2.robot](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/smoketest/debug/ozone-debug-tests-ec3-2.robot)
 with robot tests that assert blockData.size per replica for EC(3,2) covering 
different stripe layout scenarios. Add the same tests for EC(6,3) suite and 
wire it into 
[ec-test.sh](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/common/ec-test.sh)
 after scaling to 9 datanodes.
   
   Also refactor keywords `Count Datanodes In Service` and `Has Enough 
Datanodes` from 
[awss3ecstorage.robot](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/smoketest/ec/awss3ecstorage.robot)
 to ec library.
   
   CI: https://github.com/len548/ozone/actions/runs/28040511916/job/83005380343


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to