[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

Andrew Wang (JIRA) Thu, 28 May 2015 16:34:37 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563895#comment-14563895
 ]


Andrew Wang commented on HDFS-7923:
-----------------------------------

Neato patch Colin, this is a high-ish level review, I probably need to do 
another pass.

Small stuff:
* Missing config key documentation in hdfs-defaults.xml
* requestBlockReportLeaseId: empty catch for unregistered node, we could add 
some more informative logging rather than relying on the warn below

BlockReportLeaseManager
* I discussed the NodeData structure with Colin offline, wondering why we 
didn't use a standard Collection. Colin brought up the reason of reducing 
garbage, which seems valid. I think we should consider implementing 
IntrusiveCollection though rather than writing another.
* I also asked about putting NodeData into DatanodeDescriptor. Not sure what 
the conclusion was on this, it might reduce garbage since we don't need a 
separate NodeData object.
* I prefer Precondition checks for invalid configuration values at startup, so 
there aren't any surprises for the user. Not everyone reads the messages on 
startup.
* requestLease has a check for isTraceEnabled, then logs at debug level

BPServiceActor:
* In offerService, we ignore the new leaseID if we already have one. On the NN 
though, a new request wipes out the old leaseID, and processReport checks based 
on leaseID rather than node. This kind of bug makes me wonder why we really 
need the leaseID at all, why not just attach a boolean to the node? Or if it's 
in the deferred vs. pending list?
* Can we fix the javadoc for scheduleBlockReport to mention randomness, and not 
"send...at the next heartbeat?" Incorrect right now.
* Have you thought about moving the BR scheduler to the NN side? We still rely 
on the DNs to jitter themselves and do the initial delay, but we could have the 
NN handle all this. This would also let the NN trigger FBRs whenever it wants. 
We could also do better than random scheduling, i.e. stride it rather than 
jitter. Incompatible, so we probably won't, but fun to think about :)
* scheduleBlockReport(long) do we want to add a checkArgument that delayMs is 
geq 0? You nixed the else case.

DatanodeManager:
* Could we do the BRLManager register/unregister in addDatanode and 
removeDatanode? I think this is safe, since on a DN restart it'll provide a 
lease ID of 0 and FBR, even without a reg/unreg.

> The DataNodes should rate-limit their full block reports by asking the NN on 
> heartbeat messages
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7923
>                 URL: https://issues.apache.org/jira/browse/HDFS-7923
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
> HDFS-7923.002.patch, HDFS-7923.003.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this 
> by first sending a heartbeat message to the NN with an optional boolean set 
> which requests permission to send a full block report.  If the NN responds 
> with another optional boolean set, the DN will send an FBR... if not, it will 
> wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

Reply via email to