[
https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299652#comment-15299652
]
Kai Zheng commented on HDFS-9833:
---------------------------------
1. The following two would be good to have the same names:
{code}
+ static public byte[] convertBlockIndices(List<Integer> blockIndices) {
+ @SuppressWarnings("unchecked")
+ byte[] blkIndices = new byte[blockIndices.size()];
+ for (int i = 0; i < blockIndices.size(); i++) {
+ blkIndices[i] = (byte) blockIndices.get(i).intValue();
+ }
+ return blkIndices;
+ }
+
+ public static List<Integer> convert(byte[] blockIndices) {
+ List<Integer> results = new ArrayList<>(blockIndices.length);
+ for (byte bt : blockIndices) {
+ results.add(Integer.valueOf(bt));
+ }
+ return results;
+ }
+
{code}
2. HashMap here might be little heavy, an array should work instead.
{code}
HashMap<Byte, ECBlockInfo> liveDns = new HashMap<>(datanodes.length);
{code}
3. The logic below looks like to recompute only one block checksum a time. The
reconstruction can be done just by a time if multiple blocks are in the
question. Sure it can be updated in follow-on tasks.
{code}
+ for (int idx = 0; idx < numDataUnits && idx < blkIndxLen; idx++) {
+ try {
+ ECBlockInfo ecBlkInfo = liveDns.get((byte) idx);
+ if (null == ecBlkInfo) {
+ // reconstruct block and calculate checksum for missing node
+ recalculateChecksum(idx);
+ } else {
+ try {
+ ExtendedBlock block = StripedBlockUtil.constructInternalBlock(
+ blockGroup, ecPolicy.getCellSize(), numDataUnits, idx);
+ checksumBlock(block, idx, ecBlkInfo.getToken(),
+ ecBlkInfo.getDn());
+ } catch (IOException ioe) {
+ LOG.warn("Exception while reading checksum", ioe);
+ // reconstruct block and calculate checksum for the failed node
+ recalculateChecksum(idx);
+ }
+ }
+ } catch (IOException e) {
+ LOG.warn("Failed to get the checksum", e);
+ }
{code}
4. Could we have some wrapper like *ReconstructionInfo* to contain the relevant
parameters? I'm afraid we may need more in future ...
{code}
StripedReader(StripedReconstructor reconstructor, DataNode datanode,
- Configuration conf,
- BlockECReconstructionInfo reconstructionInfo) {
+ Configuration conf, ErasureCodingPolicy ecPolicy,
+ ExtendedBlock blockGroup, byte[] liveIndices, DatanodeInfo[] sources) {
{code}
5. The following *TODO* should be resolved since you have added two tests in
datanode failures.
{code}
// TODO: allow datanode failure, HDFS-9833
{code}
> Erasure coding: recomputing block checksum on the fly by reconstructing the
> missed/corrupt block data
> -----------------------------------------------------------------------------------------------------
>
> Key: HDFS-9833
> URL: https://issues.apache.org/jira/browse/HDFS-9833
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Rakesh R
> Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch,
> HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch
>
>
> As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum
> even some of striped blocks are missed, we need to consider recomputing block
> checksum on the fly for the missed/corrupt blocks. To recompute the block
> checksum, the block data needs to be reconstructed by erasure decoding, and
> the main needed codes for the block reconstruction could be borrowed from
> HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC
> worker, reconstructed blocks need to be written out to target datanodes, but
> here in this case, the remote writing isn't necessary, as the reconstructed
> block data is only used to recompute the checksum.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]