[
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559403#comment-16559403
]
Elek, Marton commented on HDDS-245:
-----------------------------------
Thank you very much [~xyao] the review:
{quote}
ContianerReportHandler.java
Line 93-95: should we consolidate this inside node2ContainerMap.processReport()?
After second thoughts, I think a better place might be updating the
DN2ContainerMap after we emit the replication requestat line 108.
{quote}
emitReplicationRequestEvent is async operation. I would like to be sure that
during the event processing the node2ContainerMap is up-to-date (as the
ReplicationManager may use information from the node2ContainerMap). That's the
reason why I moved this line before the event emitting.
updateDatanodeMap and processReport are two different methods with different
responsibilities and they could be tested in this level (processReport has a
lot of low level unit tests). If you think we need one method I can create a
third one which calls both of them.
{quote}
Node2ContainerMap.java
Line 134/140: we need to check if currentSet is null before invoke removeAll to
avoid NPE or use the computeIfPresent?
{quote}
Fix me If I am wrong, but newContainers and missingContainers never could be
null as initialized here. currentSet can not be null as the existence of the
key is checked in L124 (isKnownDatanode) and containers can not be null as we
have a precondition check at L122.
I would be happy to add more checks but I don't know which one is reasonable...
{quote}
Node2ContainerMapTest.java should be renamed to TestNode2ContainerMap.java
{quote}
It should be included in the current test. (I renamed the tests and realized
that some of them are failing which are fixed in the patch.) But I had patches
in wrong format earlies, let me know it if can't be applied with the rename.
{code}
diff --git
a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
b/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
similarity index 91%
rename from
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
rename to
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
index 79f1b40db03..4796f4db85a 100644
---
a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
+++
b/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
@@ -38,7 +38,7 @@
{code}
{quote}
TestContainerReportHandler.java
Line 127:130: NIT: unused can be removed.
{quote}
Sure, removed.
{quote}
StorageContainerManager.java
Line 234: this can be removed, the handler has already been added on line 228.
{quote}
Good catch, removed.
{quote}
ReplicationActivityStatus.java
Line 61: I did not follow this. Can you elaborate on this or add some TODO to
fix it later?
{quote}
The idea was to enable the replication with an async event. But you are right,
it's strange that we can turn it on by even but can't turn it off. I modified
the EventHandler<Void> to EventHandler<Boolean>, now it could be
turned off and on with event where the payload is just a boolean.
[~ljain]: thank you your review as well.
{quote}
ReportResult:58,59 - we can keep the missingContainers and newContainers as
null.
{quote}
Sometimes (eg. Node2ContainerMap L152) only the newContainers are added. I
would like to be sure that the sets are not null. Do you think it's a problem
because a GC pressure? I think (hope?) it's negligible as we create a lot of
hash sets in Node2ContainerMap anyway and the very short term memory objects
are handler very well (AFAIK).
{quote}
2. ContainerMapping#getContainerWithPipeline needs to be updated for closed
container case. For closed containers we need to fetch the datanodes from
ContainerStateMap and return the appropriate pipeline information.
{quote}
Fix me, If I am wrong, I am trying to rephrase: You say that the
SCMClientProtocolServer.getContainerWithPipeline won't work until we return the
pipeline information based on ContainerStateMap.contReplicaMap. And (I guess)
without that we can't read closed containers.
[~anu]: Do we have separated issue for that?
{quote}
3. START_REPLICATION is currently not fired by any publisher. I guess it will
be part of another jira?
{quote}
Yes, the event should be sent when we are leaving the chill mode. [~anu]: Do I
need to create a separated jira or will be handled by HDDS-2?
{quote}
4. We are currently processing the report as soon it is received. Are we
handling the case when a container is added in one DN and has been removed from
another DN? In such a case we might be sending out a false replicate event as
replication count would still match the replication factor.
{quote}
Could you please give more info for this example? If container is added _and_
removed (and both are finished) the replication factor should be ok. In case of
in-progress replications the ReplicationManager (which listens to this events)
will recalculate the replication number where the in-progress copies are also
considered. So the calculation here is just a draft to sort the replication
requests by priority, the ReplicationManager will recalculate the required
replicas.
> Handle ContainerReports in the SCM
> ----------------------------------
>
> Key: HDDS-245
> URL: https://issues.apache.org/jira/browse/HDDS-245
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: SCM
> Reporter: Elek, Marton
> Assignee: Elek, Marton
> Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-245.001.patch, HDDS-245.002.patch,
> HDDS-245.003.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode
> events, update the containerStateMap and node2ContainerMap and calculate the
> missing/duplicate containers and send the ReplicateCommand to the
> ReplicateManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]