[
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563895#comment-14563895
]
Andrew Wang commented on HDFS-7923:
-----------------------------------
Neato patch Colin, this is a high-ish level review, I probably need to do
another pass.
Small stuff:
* Missing config key documentation in hdfs-defaults.xml
* requestBlockReportLeaseId: empty catch for unregistered node, we could add
some more informative logging rather than relying on the warn below
BlockReportLeaseManager
* I discussed the NodeData structure with Colin offline, wondering why we
didn't use a standard Collection. Colin brought up the reason of reducing
garbage, which seems valid. I think we should consider implementing
IntrusiveCollection though rather than writing another.
* I also asked about putting NodeData into DatanodeDescriptor. Not sure what
the conclusion was on this, it might reduce garbage since we don't need a
separate NodeData object.
* I prefer Precondition checks for invalid configuration values at startup, so
there aren't any surprises for the user. Not everyone reads the messages on
startup.
* requestLease has a check for isTraceEnabled, then logs at debug level
BPServiceActor:
* In offerService, we ignore the new leaseID if we already have one. On the NN
though, a new request wipes out the old leaseID, and processReport checks based
on leaseID rather than node. This kind of bug makes me wonder why we really
need the leaseID at all, why not just attach a boolean to the node? Or if it's
in the deferred vs. pending list?
* Can we fix the javadoc for scheduleBlockReport to mention randomness, and not
"send...at the next heartbeat?" Incorrect right now.
* Have you thought about moving the BR scheduler to the NN side? We still rely
on the DNs to jitter themselves and do the initial delay, but we could have the
NN handle all this. This would also let the NN trigger FBRs whenever it wants.
We could also do better than random scheduling, i.e. stride it rather than
jitter. Incompatible, so we probably won't, but fun to think about :)
* scheduleBlockReport(long) do we want to add a checkArgument that delayMs is
geq 0? You nixed the else case.
DatanodeManager:
* Could we do the BRLManager register/unregister in addDatanode and
removeDatanode? I think this is safe, since on a DN restart it'll provide a
lease ID of 0 and FBR, even without a reg/unreg.
> The DataNodes should rate-limit their full block reports by asking the NN on
> heartbeat messages
> -----------------------------------------------------------------------------------------------
>
> Key: HDFS-7923
> URL: https://issues.apache.org/jira/browse/HDFS-7923
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch,
> HDFS-7923.002.patch, HDFS-7923.003.patch
>
>
> The DataNodes should rate-limit their full block reports. They can do this
> by first sending a heartbeat message to the NN with an optional boolean set
> which requests permission to send a full block report. If the NN responds
> with another optional boolean set, the DN will send an FBR... if not, it will
> wait until later. This can be done compatibly with optional fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)