sodonnel commented on PR #7401:
URL: https://github.com/apache/ozone/pull/7401#issuecomment-2465356589

   The approach used here, is to take the chunk buffer, which holds the real 
data just written to the block, and calculate the checksum on it.
   
   However that is duplicating work, as the act of writing the data through the 
ECBlockOutput stream already performs that checksum and persists it in the 
block metadata as part of the put block.
   
   I have had to look at this for some time to try to understand the current 
flow. Its been a long time since this EC code was written, and the checksum 
stuff was not written by me. @aswinshakil might be a good person for a second 
look.
   
   Starting in the ECReconstructionCoordinator, there is code where it calls 
`executePutBlock(...)` on the reconstructed streams. Here, I think, is where we 
can validate the checks match the stripe checksum:
   
   ```
           for (ECBlockOutputStream targetStream : allStreams) {
   
            // You can get the current chunkList and its checksums calculated 
while writing. These are what will be written
            // as part of the putBlock call. However if we get them here, each 
chunk has its checksums.
            // Using blockDataGroup, which is all the blockData that existed on 
the containers prior to any reconstruction, we can
            // search it for one which contains the stripChecksum. We know it 
lives in replicaIndex=1 or any parity, however you
            // many not have index 1 (it could be getting reconstructed) or all 
the parities, but you must have at least 1 of them
            // to make the thing reconstructable. There you must search until 
it can be found.
            // 
            // 
targetStream.getContainerBlockData().getChunksList().get(0).getChecksumData();
            // blockDataGroup[0].getChunks().get(0).getStripeChecksum();
            //
            // From above, if you have the chunkList and hence its checksums 
for the current stream, and you can locate
            // the existing stripe checksum in the blockDataGroup, then you can 
"simply" iterate the chunkList:
            //  
            //  List<Chunk> chunks =  
targetStream.getContainerBlockData().getChunksList();
            //  List<Chunk> existingChunks = blockDataGroup[0].getChunks();
            // for (int i = 0; i < chunks.length; i++ ) {
            //      validateChecksum(chunks.get(i).getChecksumData(), 
existingChunks.get(i).getStripChecksum());
            // }
            //
   
             targetStream.executePutBlock(true, true, 
blockLocationInfo.getLength(), blockDataGroup);
             checkFailures(targetStream, 
targetStream.getCurrentPutBlkResponseFuture());
           }
   ```
   
   Inside `validateChecksum()` you need to figure out how to index into the 
strip checksum to find the relevant part of it to compare against the 
chunkchecksum.
   
   I think that approach will work, and it avoids calculating the checksum from 
the data a second time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to