[
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560153#comment-16560153
]
Anu Engineer commented on HDDS-245:
-----------------------------------
The changes look excellent. I am a +1 on this patch. Some small comments, most
of them are non-actionable in this patch. As soon you take a look, we can
probably commit this patch.
* *ContainerReportHandler.java#ContainerReportHandler*
Add a Precondition.NotNull check for \{containerMapping, node2ContainerMap,
replicationActivityStatus}
* *ContainerReportHandler.java#onMessage*
{code:java}
//update state in container db and trigger close container events
containerMapping.processContainerReports(datanodeOrigin, containerReport);
{code}
I don't think this is what we should do. This function follows the old state
logic. IMHO, we should replace this completely. However, we can commit this
patch and go rewrite {{containerMapping.processContainerReports}} function
later. The current implementation makes it the code kind of kludge. cc:
[~nandakumar131].
* *ContainerReportHandler.java#onMessage*
{code:java}
Set<ContainerID> containerIds = containerReport.getReportsList().stream()
.map(containerProto -> containerProto.getContainerID())
.map(ContainerID::new)
.collect(Collectors.toSet());
{code}
A nit: Technically a container report should *only* contain unique container
IDs. But I am not sure if that is enforced on the datanode side. This change
can loose some IDs. But no action is needed here. Just pointing out something
that I thought about.
* ContainerReportHandler.java#onMessage
I am being generous today with my comments :). So I am going to share some
history behind this code fragment.
{code:java}
ReportResult reportResult = node2ContainerMap
.processReport(datanodeOrigin.getUuid(), containerIds);
//we have the report, so we can update the states for the next iteration.
node2ContainerMap
.updateDatanodeMap(datanodeOrigin.getUuid(), containerIds);
{code}
My original proposal (exactly as the code in this patch) was that we will
update the node report and return the resultset. [~nandakumar131] while
reviewing the original design pointed out a strange problem, he said it is
possible that you might update the Node2Container State, but fail to update the
state in ContainerStateManager.
If that happens, we will never detect the inconsistency because the
Node2Container mapping is used to evaluate if we need to update the state of
ContainerManager. So he proposed that we first compute the difference, then
update ContainerStateManager and then come back and update the Node2Container
map.
Bottom line: He proposed that we do:
# processReport
# The Loop you have for processing the Missing Containers and new containers
# The update the updateDatanodeMap – So the change is trivial and it is just
defensive coding. The actual probability of something like this happening is
very low. So move line 94 to Line 107 is the change.
* A future feature request: If we get an IOException during the report
processing, can we have a metric for that ? not only log it, but a metric that
we can be used to monitor and flag this issue.
* Node2ContainerMapping.java – Not sure why we are allocating new HashSets;
isn't the calling function giving us a set already do we need to make a copy ?
(Just scrolled up and saw [~xyao] comments) I am pretty sure they cannot be
null, but it is not a big deal.
* ReplicationActivityStatus: Great idea, and thanks for supporting this in
this patch.
> Handle ContainerReports in the SCM
> ----------------------------------
>
> Key: HDDS-245
> URL: https://issues.apache.org/jira/browse/HDDS-245
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: SCM
> Reporter: Elek, Marton
> Assignee: Elek, Marton
> Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-245.001.patch, HDDS-245.002.patch,
> HDDS-245.003.patch, HDDS-245.004.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode
> events, update the containerStateMap and node2ContainerMap and calculate the
> missing/duplicate containers and send the ReplicateCommand to the
> ReplicateManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]