[ 
https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549249#comment-13549249
 ] 

Luke Lu commented on HADOOP-9160:
---------------------------------

Thanks for the feedback folks. I hope you all enjoyed the holidays!

Anyway, let's back up a bit and take a look at pros and cons of proposed 
management protocols (JMX, custom REST, custom RPC) with a more systematic 
approach, since anecdotal arguments about limited JMX deployment is not 
convincing, as one can argue that for any single Hadoop cluster that doesn't 
use JMX, there are more than 10 WebSphere/JBoss/WebLogic/Tomcat clusters that 
use JMX securely just fine.

Let me clarify the scope of the admin/management functions the proposed 
protocol targets: dfsadmin, mradmin (and yarn equivalent) and resource 
management functions, e.g., commission/decommission/suspend/resume data and 
compute (TT/NM) nodes, slots/resource configuration for compute nodes etc.  
Anything in user facing (FileSystem and MR interfaces) interfaces are not 
included. The users of the protocols are sysadmins and management daemons. IMO, 
some of the most important aspects of management protocol are: 
interoperability, compatibility, isolation, security and ease of implementation.

*Interoperability* is critical for management protocols. It's unfriendly, 
tedious and architecturally unsound to require common management tools to use 
application specific protocols. Just imagine managing clusters with many 
workloads where Hadoop is only one of them and each component requires a 
different management protocol. This is one of the main reasons that we have 
standards in management protocols. JMX is _the_ standard for managing JVM 
services. If JMX is based on HTTP instead of RMI, we probably won't have this 
debate, as it's reasonable to argue that JMX on RMI is unfriendly to non-JVM 
languages. Fortunately because JMX _is_ the standard, people outside the Hadoop 
community have already addressed the concern with REST APIs for JMX. When I 
filed HADOOP-7144, the implementations were not mature enough to adopt. Now a 
promising open source implemenation have emerged: 
[Jolokia|http://www.jolokia.org/], which evolved from Jmx4Perl and now have 
good support for [Javascript|http://www.jolokia.org/client/javascript.html], 
[Perl|http://www.jolokia.org/client/perl.html] and 
[Python|https://github.com/cwood/pyjolokia] with (relatively) extensive 
[documentation|http://www.jolokia.org/documentation.html]. The software is 
Apache 2.0 licensed with artifacts published in maven central. It's trivial to 
enable jolokia as a servlet, so that the hadoop-auth SPNEGO plugin can be used 
(see demo patch). JMX is not just a protocol. It's a management infrastructure 
and enable client to browse, discover and use new management services without 
client modification/recompilation. Jolokia supports 
list/search/read/write/execute of management services and more (bulk requests). 
 It even has a readline enabled shell (j4psh) to interact with JMX with no 
client JVM startup overhead. Significant work will be required to build a 
custom REST API and client libraries for different languages. Even when some of 
the Hadoop daemons are rewritten in other languages, these daemons can 
implement a subset of Jolokia REST/JSON API to take advantage of the rich 
client support for admin friendly languages (python and especially perl).

Interoperability scores (0: poor, 1: fair, 2: good): JMX (with Jolokia): 2; 
custom REST: 1; custom RPC: 0.

*Compatibility* is important for management protocols. It's 
inconvenient/tedious for management tools to manage services that have 
incompatible management protocols across different versions of managed 
services. JMX is a stable industry standard for more than a decade. Jolokia has 
been evolved for 3 years with relatively stable REST APIs. A new custom REST 
API will need similar time frame to evolve, though it's doable albeit tedious 
to implement compatibility on the client side for REST APIs if the protocol 
supports version requests (which Jolokia has). Compatibility is not supported 
for stable branches of Hadoop. It's also hard to maintain compatibility of 
Hadoop RPC even for branch-2 and up, because the protocol is not mature enough 
(see discussions in HADOOP-9151). The major concerns for Hadoop RPC are 
performance and quality of service and we just started to address the latter 
(HADOOP-9194). It's highly probable that the next version of Hadoop RPC will be 
incompatible with the 2.0.x releases. Separation of concerns is a hallmark of 
good software design. While the performance/compatibility trade-off needs to be 
made for application protocols, it's unnecessary for management protocols.

Compatibility scores: JMX: 2; custom REST: 1; custom RPC: 0.


*Isolation* is a somewhat unique requirement for management protocols. Seasoned 
ops will not setup a datacenter without serial console (and/or IPMI) access to 
every machine, as we cannot trust managed software (even OSes) to be perfect.  
It's common to see high RPC latency (up to minutes) on busy hadoop daemons 
(especially on JT and occasionally NN). It's very desirable for the managed 
services to respond to resource management requests quickly (<=1s latency) in 
spite of the user load. On daemons with a http server already serving user 
load, we need to start a separate http server for the custom REST API for good 
isolation, due to the potential thread pool exhaustion for user load (shuffle, 
webhdfs etc.). JMX MBeanServer is built in every java process and can respond 
quickly to admin requests even if user load are otherwise high, as long as the 
node is not swapping. We can also have a special Hadoop RPC server dedicated to 
the custom RPC, which is better than sharing the RPC server with user load. 
Though it still has lower isolation, in case of the bugs in RPC code.

Isolation scores: JMX: 2 if on RMI or separate http server with Jolokia, 1 
otherwise; custom REST: 2 if separate server, 1 otherwise; custom RPC: 1 if 
separate server, 0 otherwise.

*Security* is important for obvious reasons. JMX security setup (SSL with 
password or client cert) is well documented and trivial if the client is inside 
the firewall (otherwise RMI complicates the setup), which is the common case.  
The setup is uniform across all components supporting JMX. For most enterpise 
clusters, JMX security is a sunk cost. Further more, Jolokia enables hadoop 
servlet auth filter plugin to support various authentication schemes including 
SPNEGO/Kerberos. People can enable Jolokia and disable JMX remote for 
convenience (on Hadoop only clusters) or enable both for even better isolation 
guarantees. The choices should be our users (sysadmins). JMX supports arbitrary 
role based authentication and authorization. Jolokia also support fine grained 
authorization configuration to restrict access at MBean granularities. These 
functionalities don't exist yet for either the custom REST or custom RPC 
approaches.

Security scores: JMX: 2; custom REST: 1; custom RPC: 1;

*Ease of implementation* is directly linked to the maintenance cost. I've 
already demonstrated ease of implementation of JMX services with much better 
tooling (including the ubiquitous jconsole for dev/debug) than custom REST or 
custom RPC, with the latter having the most verbose/tedious implementation with 
code generations. A custom REST API or RPC solution will require major effort 
to even begin to match the functionality offered by JMX/Jolokia _now_, 
especially considering the client support and documentation. Less code is 
required to enable both JMX and REST/JSON API via Jolokia with much better 
documentation and support besides the Hadoop community than either custom REST 
or custom RPC approaches.

Ease of implementation scores: JMX: 2; custom REST: 1; custom RPC: 0

Based on the above analysis, the best management protocol for Hadoop going 
forward is JMX/Jolokia, which can also vastly improve the manageability of 
Hadoop in stable branches without breaking compatibility.

The demo patch shows that the code to optionally enable such great improvement 
of manageability is trivial.

                
> Adopt JMX for management protocols
> ----------------------------------
>
>                 Key: HADOOP-9160
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9160
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Luke Lu
>
> Currently we use Hadoop RPC (and some HTTP, notably fsck) for admin 
> protocols. We should consider adopt JMX for future admin protocols, as it's 
> the industry standard for java server management with wide client support.
> Having an alternative/redundant RPC mechanism is very desirable for admin 
> protocols. I've seen in the past in multiple cases, where NN and/or JT RPC 
> were locked up solid due to various bugs and/or RPC thread pool exhaustion, 
> while HTTP and/or JMX worked just fine.
> Other desirable benefits include admin protocol backward compatibility and 
> introspectability, which is convenient for a centralized management system to 
> manage multiple Hadoop clusters of different versions. Another notable 
> benefit is that it's much easier to implement new admin commands in JMX 
> (especially with MXBean) than Hadoop RPC, especially in trunk (as well as 
> 0.23+ and 2.x).
> Since Hadoop RPC doesn't guarantee backward compatibility (probably not ever 
> for branch-1), there are few external tools depending on it. We can keep the 
> old protocols for as long as needed. New commands should be in JMX. The 
> transition can be gradual and backward-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to