sodonnel commented on PR #7401:
URL: https://github.com/apache/ozone/pull/7401#issuecomment-2465356589
The approach used here, is to take the chunk buffer, which holds the real
data just written to the block, and calculate the checksum on it.
However that is duplicating work, as the act of writing the data through the
ECBlockOutput stream already performs that checksum and persists it in the
block metadata as part of the put block.
I have had to look at this for some time to try to understand the current
flow. Its been a long time since this EC code was written, and the checksum
stuff was not written by me. @aswinshakil might be a good person for a second
look.
Starting in the ECReconstructionCoordinator, there is code where it calls
`executePutBlock(...)` on the reconstructed streams. Here, I think, is where we
can validate the checks match the stripe checksum:
```
for (ECBlockOutputStream targetStream : allStreams) {
// You can get the current chunkList and its checksums calculated
while writing. These are what will be written
// as part of the putBlock call. However if we get them here, each
chunk has its checksums.
// Using blockDataGroup, which is all the blockData that existed on
the containers prior to any reconstruction, we can
// search it for one which contains the stripChecksum. We know it
lives in replicaIndex=1 or any parity, however you
// many not have index 1 (it could be getting reconstructed) or all
the parities, but you must have at least 1 of them
// to make the thing reconstructable. There you must search until
it can be found.
//
//
targetStream.getContainerBlockData().getChunksList().get(0).getChecksumData();
// blockDataGroup[0].getChunks().get(0).getStripeChecksum();
//
// From above, if you have the chunkList and hence its checksums
for the current stream, and you can locate
// the existing stripe checksum in the blockDataGroup, then you can
"simply" iterate the chunkList:
//
// List<Chunk> chunks =
targetStream.getContainerBlockData().getChunksList();
// List<Chunk> existingChunks = blockDataGroup[0].getChunks();
// for (int i = 0; i < chunks.length; i++ ) {
// validateChecksum(chunks.get(i).getChecksumData(),
existingChunks.get(i).getStripChecksum());
// }
//
targetStream.executePutBlock(true, true,
blockLocationInfo.getLength(), blockDataGroup);
checkFailures(targetStream,
targetStream.getCurrentPutBlkResponseFuture());
}
```
Inside `validateChecksum()` you need to figure out how to index into the
strip checksum to find the relevant part of it to compare against the
chunkchecksum.
I think that approach will work, and it avoids calculating the checksum from
the data a second time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]