[
https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549249#comment-13549249
]
Luke Lu commented on HADOOP-9160:
---------------------------------
Thanks for the feedback folks. I hope you all enjoyed the holidays!
Anyway, let's back up a bit and take a look at pros and cons of proposed
management protocols (JMX, custom REST, custom RPC) with a more systematic
approach, since anecdotal arguments about limited JMX deployment is not
convincing, as one can argue that for any single Hadoop cluster that doesn't
use JMX, there are more than 10 WebSphere/JBoss/WebLogic/Tomcat clusters that
use JMX securely just fine.
Let me clarify the scope of the admin/management functions the proposed
protocol targets: dfsadmin, mradmin (and yarn equivalent) and resource
management functions, e.g., commission/decommission/suspend/resume data and
compute (TT/NM) nodes, slots/resource configuration for compute nodes etc.
Anything in user facing (FileSystem and MR interfaces) interfaces are not
included. The users of the protocols are sysadmins and management daemons. IMO,
some of the most important aspects of management protocol are:
interoperability, compatibility, isolation, security and ease of implementation.
*Interoperability* is critical for management protocols. It's unfriendly,
tedious and architecturally unsound to require common management tools to use
application specific protocols. Just imagine managing clusters with many
workloads where Hadoop is only one of them and each component requires a
different management protocol. This is one of the main reasons that we have
standards in management protocols. JMX is _the_ standard for managing JVM
services. If JMX is based on HTTP instead of RMI, we probably won't have this
debate, as it's reasonable to argue that JMX on RMI is unfriendly to non-JVM
languages. Fortunately because JMX _is_ the standard, people outside the Hadoop
community have already addressed the concern with REST APIs for JMX. When I
filed HADOOP-7144, the implementations were not mature enough to adopt. Now a
promising open source implemenation have emerged:
[Jolokia|http://www.jolokia.org/], which evolved from Jmx4Perl and now have
good support for [Javascript|http://www.jolokia.org/client/javascript.html],
[Perl|http://www.jolokia.org/client/perl.html] and
[Python|https://github.com/cwood/pyjolokia] with (relatively) extensive
[documentation|http://www.jolokia.org/documentation.html]. The software is
Apache 2.0 licensed with artifacts published in maven central. It's trivial to
enable jolokia as a servlet, so that the hadoop-auth SPNEGO plugin can be used
(see demo patch). JMX is not just a protocol. It's a management infrastructure
and enable client to browse, discover and use new management services without
client modification/recompilation. Jolokia supports
list/search/read/write/execute of management services and more (bulk requests).
It even has a readline enabled shell (j4psh) to interact with JMX with no
client JVM startup overhead. Significant work will be required to build a
custom REST API and client libraries for different languages. Even when some of
the Hadoop daemons are rewritten in other languages, these daemons can
implement a subset of Jolokia REST/JSON API to take advantage of the rich
client support for admin friendly languages (python and especially perl).
Interoperability scores (0: poor, 1: fair, 2: good): JMX (with Jolokia): 2;
custom REST: 1; custom RPC: 0.
*Compatibility* is important for management protocols. It's
inconvenient/tedious for management tools to manage services that have
incompatible management protocols across different versions of managed
services. JMX is a stable industry standard for more than a decade. Jolokia has
been evolved for 3 years with relatively stable REST APIs. A new custom REST
API will need similar time frame to evolve, though it's doable albeit tedious
to implement compatibility on the client side for REST APIs if the protocol
supports version requests (which Jolokia has). Compatibility is not supported
for stable branches of Hadoop. It's also hard to maintain compatibility of
Hadoop RPC even for branch-2 and up, because the protocol is not mature enough
(see discussions in HADOOP-9151). The major concerns for Hadoop RPC are
performance and quality of service and we just started to address the latter
(HADOOP-9194). It's highly probable that the next version of Hadoop RPC will be
incompatible with the 2.0.x releases. Separation of concerns is a hallmark of
good software design. While the performance/compatibility trade-off needs to be
made for application protocols, it's unnecessary for management protocols.
Compatibility scores: JMX: 2; custom REST: 1; custom RPC: 0.
*Isolation* is a somewhat unique requirement for management protocols. Seasoned
ops will not setup a datacenter without serial console (and/or IPMI) access to
every machine, as we cannot trust managed software (even OSes) to be perfect.
It's common to see high RPC latency (up to minutes) on busy hadoop daemons
(especially on JT and occasionally NN). It's very desirable for the managed
services to respond to resource management requests quickly (<=1s latency) in
spite of the user load. On daemons with a http server already serving user
load, we need to start a separate http server for the custom REST API for good
isolation, due to the potential thread pool exhaustion for user load (shuffle,
webhdfs etc.). JMX MBeanServer is built in every java process and can respond
quickly to admin requests even if user load are otherwise high, as long as the
node is not swapping. We can also have a special Hadoop RPC server dedicated to
the custom RPC, which is better than sharing the RPC server with user load.
Though it still has lower isolation, in case of the bugs in RPC code.
Isolation scores: JMX: 2 if on RMI or separate http server with Jolokia, 1
otherwise; custom REST: 2 if separate server, 1 otherwise; custom RPC: 1 if
separate server, 0 otherwise.
*Security* is important for obvious reasons. JMX security setup (SSL with
password or client cert) is well documented and trivial if the client is inside
the firewall (otherwise RMI complicates the setup), which is the common case.
The setup is uniform across all components supporting JMX. For most enterpise
clusters, JMX security is a sunk cost. Further more, Jolokia enables hadoop
servlet auth filter plugin to support various authentication schemes including
SPNEGO/Kerberos. People can enable Jolokia and disable JMX remote for
convenience (on Hadoop only clusters) or enable both for even better isolation
guarantees. The choices should be our users (sysadmins). JMX supports arbitrary
role based authentication and authorization. Jolokia also support fine grained
authorization configuration to restrict access at MBean granularities. These
functionalities don't exist yet for either the custom REST or custom RPC
approaches.
Security scores: JMX: 2; custom REST: 1; custom RPC: 1;
*Ease of implementation* is directly linked to the maintenance cost. I've
already demonstrated ease of implementation of JMX services with much better
tooling (including the ubiquitous jconsole for dev/debug) than custom REST or
custom RPC, with the latter having the most verbose/tedious implementation with
code generations. A custom REST API or RPC solution will require major effort
to even begin to match the functionality offered by JMX/Jolokia _now_,
especially considering the client support and documentation. Less code is
required to enable both JMX and REST/JSON API via Jolokia with much better
documentation and support besides the Hadoop community than either custom REST
or custom RPC approaches.
Ease of implementation scores: JMX: 2; custom REST: 1; custom RPC: 0
Based on the above analysis, the best management protocol for Hadoop going
forward is JMX/Jolokia, which can also vastly improve the manageability of
Hadoop in stable branches without breaking compatibility.
The demo patch shows that the code to optionally enable such great improvement
of manageability is trivial.
> Adopt JMX for management protocols
> ----------------------------------
>
> Key: HADOOP-9160
> URL: https://issues.apache.org/jira/browse/HADOOP-9160
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Luke Lu
>
> Currently we use Hadoop RPC (and some HTTP, notably fsck) for admin
> protocols. We should consider adopt JMX for future admin protocols, as it's
> the industry standard for java server management with wide client support.
> Having an alternative/redundant RPC mechanism is very desirable for admin
> protocols. I've seen in the past in multiple cases, where NN and/or JT RPC
> were locked up solid due to various bugs and/or RPC thread pool exhaustion,
> while HTTP and/or JMX worked just fine.
> Other desirable benefits include admin protocol backward compatibility and
> introspectability, which is convenient for a centralized management system to
> manage multiple Hadoop clusters of different versions. Another notable
> benefit is that it's much easier to implement new admin commands in JMX
> (especially with MXBean) than Hadoop RPC, especially in trunk (as well as
> 0.23+ and 2.x).
> Since Hadoop RPC doesn't guarantee backward compatibility (probably not ever
> for branch-1), there are few external tools depending on it. We can keep the
> old protocols for as long as needed. New commands should be in JMX. The
> transition can be gradual and backward-compatible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira