[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.

Erik Krogen (JIRA) Tue, 13 Aug 2019 11:26:36 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906493#comment-16906493
 ]


Erik Krogen commented on HDFS-14090:
------------------------------------

[~crh] thanks for keeping me honest here :) Finally got around to taking a look 
at this.

I read through the design again and had two additional concerns that came up:
 # I was considering the scenario where there are two routers R1 and R2, and 
two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. 
What happens in the situation when all of R1's N1-handlers are full (but 
N2-handlers mostly empty), and all of R2's N2-handlers are full (but 
N1-handlers mostly empty)? I'm not sure if this is a situation that is likely 
to arise, or if the system will easily self-heal based on the backoff behavior. 
Maybe worth thinking about a little--not a blocking concern for me, more of a 
thought experiment.
 # The configuration for this seems like it will be really tricky to get right, 
particularly knowing how many fan-out handlers to allocate. I imagine as an 
administrator, my thought process would be like:
 ** I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x 
as loaded as NN1. This part is fairly intuitive.
 ** Then I encounter the fan-out configuration... What am I supposed to do with 
it?
 ** Are there perhaps any heuristics we can provide for reasonable values?

Regarding the terminology, I actually think that "permit" better conveys the 
concept to me personally, however I think "quota" more closely matches similar 
terminology used throughout Hadoop. My one concern with "permit" would be that 
it might imply that some number of permits are requested at a dynamic startup 
phase (e.g. a job requests permits at startup), rather than it being a constant 
allocation count. I don't have a strong preference here. I do agree that 
{{NoPermitAvailableException}} could use a better name to more readily indicate 
that it is an overloaded situation; {{PermitLimitExceededException}} might be 
better.

 
{quote} * Instead of having two configs, one for enabling and one for the 
implementation, we could have just the implementation and by default provide a 
dummy implementation that doesn't do fairness. Then we would rename the current 
DefaultFairnessPolicyController to something more descriptive (to reflect equal 
or linear or similar).
 * Coming back to PermitAllocationException, right now we are kind of logging 
and swallowing; what about failing the whole startup?{quote}
+1 on these two ideas from [~elgoiri]

I didn't do a thorough review, but from an initial look through the code, I am 
also impressed by how isolated the changes were able to be. It's great to see. 
I have some more isolated comments below.
 # This code seems to assume that the set of nameservices controlled by a 
router will never change. I haven't been following the router closely, but I 
thought that you could dynamically change the set of mount points. I see we're 
loading the nameservices from {{DFS_ROUTER_MONITOR_NAMENODE}} – is it actually 
accurate that the set of monitored NameNodes matches the set of mount points?
 # If someone actually has a nameservice called "concurrent" (used for the 
{{concurrentNS}}), this is going to cause problems. Given that this name will 
appear in user configurations, maybe it's nice for it to be an easily 
human-readable name, but we should add some logic to detect this collision and 
complain about it.
 # I think a {{WARN}} log on a permit allocation failure is a bit strong. This 
could really flood the logs when things get busy. I would suggest downgrading 
it to a {{DEBUG}}, or using the {{LogThrottlingHelper}} to limit how frequently 
this will be logged.
 # I think we need to document the nameservice-specific configurations in 
{{hdfs-rbf-default.xml}}, including the presence of the special "concurrent" 
nameservice.
 # Nits:
 ## You've sometimes used {{this.}} as a prefix for fields and sometimes not, 
can we make it more consistent?
 ## You should include diamond-types ( {{<>}} ) in your {{HashMap}} / 
{{HashSet}} instantiations
 ## Can we combine the log statements on L113 and L115 of 
{{DefaultFairnessPolicyController}}?

> RBF: Improved isolation for downstream name nodes.
> --------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes.

Reply via email to