[
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249133#comment-15249133
]
Kai Zheng commented on HDFS-8449:
---------------------------------
Some comments. Could you help check? Thanks Bo.
1. In {{ErasureCodingWorker}}, ref. the following change, it doesn't look good
to put the counters here, because it only means the task is submitted
successfully or not, regardless of the task being actually executed
successfully or not. The right place would be in the {{run()}} method in the
{{Runnable StripedReconstructor}} task. We may not worry too much about tasks
of invalid targets because such tasks should be avoided in NN side eventually.
{code}
public void processErasureCodingTasks(
Collection<BlockECReconstructionInfo> ecTasks) {
for (BlockECReconstructionInfo reconstructionInfo : ecTasks) {
try {
final StripedReconstructor task =
new StripedReconstructor(this, reconstructionInfo);
if (task.hasValidTargets()) {
stripedReconstructionPool.submit(task);
+ datanode.getMetrics().incrECReconstructionTasks();
} else {
LOG.warn("No missing internal block. Skip reconstruction for task:{}",
reconstructionInfo);
}
} catch (Throwable e) {
LOG.warn("Failed to reconstruct striped block {}",
reconstructionInfo.getExtendedBlock().getLocalBlock(), e);
+ datanode.getMetrics().incrECFailedReconstructionTasks();
}
}
}
{code}
2. It's good to see new tests for this. As {{TestReconstructStripedFile}} has
implemented all sorts of cases that reconstruction tasks can happen, could we
improve it and add the metrics related checks in it?
> Add tasks count metrics to datanode for ECWorker
> ------------------------------------------------
>
> Key: HDFS-8449
> URL: https://issues.apache.org/jira/browse/HDFS-8449
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Li Bo
> Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch,
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done,
> including total tasks, failed tasks and sucessful tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)