[jira] [Commented] (HBASE-2742) Provide strong authentication with a secure RPC engine

Gary Helmling (Commented) (JIRA) Thu, 17 Nov 2011 17:17:18 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152558#comment-13152558
 ]


Gary Helmling commented on HBASE-2742:
--------------------------------------

Thanks for the review!  Please add additional comments to review board.  In a 
patch this big, it's much easier for me to keep track of comments and replies 
there.
----
{quote}
Looking at AuthenticationTokenSecretManager, it becomes leader upon entering 
run().
Periodically it calls rollCurrentKey() which checks whether it is still leader. 
I think rollCurrentKey() should return a boolean value indicating leadership 
status so that the while loop can re-obtain leadership if leadership is lost.
{quote}

Leader election for AuthenticationTokenSecretManager is equivalent to active 
master management.  In run(), all instances block on 
zkLeader.waitToBecomeLeader().  Only the leader returns.  The leader remains 
active until stopped or the zk session is lost (which should cause an abort), 
so I don't see where we would need to recheck in this case?

----
{quote}
In my opinion, failure to remove expired key or to add new key should be 
treated with more caution.
addKeyToZK() and updateKeyInZK() throws RuntimeException, alleviating this 
concern a little. But zkWatcher.removeKeyFromZK() swallows KeeperException, 
making things complicated. We may need to call ZKUtil.deleteNode() directly.
{quote}

Agree, removeKeyFromZK() should be aborting on KeeperException.  That was 
missing in other places as well.  Cleaning up to call watch.abort() on 
KeeperException, as well as replacing the RuntimeExceptions with 
watcher.abort().

I am very hesitant, though, to call ZKUtil.deleteNode() in the case of 
IOExceptions (which only occur from the Writable serialization).  In the case 
of say an upgrade with changed serialization format or a znode that somehow got 
corrupted data, this seems appropriate.  But I'm more worried about what would 
occur in the case of a massive misconfiguration, where 
{{zookeeper.znode.tokenauth.parent}} is pointing to the wrong path, and what 
the consequences might be from deleting every znode it can't deserialize.  I'd 
rather abort the server in this case and let the end user deal with it.

Some ZKWatcher methods shouldn't even have been throwing IOExceptions.  Those 
have been fixed.  The remaining cases (nodeDataChanged(), refreshNodes(), 
addKeyToZK(), updatedKeyInZK()) have been changed to abort on IOException since 
it would likely mean a programming error.

----
{quote}
In src/main/resources/hbase-default.xml, I think we should use dot instead of 
dash for parameter key names:
  hbase.auth.key.update-interval should be hbase.auth.key.update.interval
  hbase.auth.token.max-lifetime should be hbase.auth.token.max.lifetime
{quote}

These keys were named to match the keys used for HDFS delegation tokens:
dfs.namenode.delegation.key.update-interval
dfs.namenode.delegation.token.max-lifetime

But, since we are consistent about not using '-' in HBase config I will replace.

----
{quote}
In ZKLeaderManager.stepDownAsLeader():

ZKUtil.deleteNodeFailSilent(watcher, leaderZNode);

I think we should call ZKUtil.deleteNode() instead and handle KeeperException 
gracefully because waitToBecomeLeader() would only delete zk node which has the 
other candidate's Id.
{quote}

I don't really see what ZKUtil.deleteNode() would improve in this context?  If 
we instead got the KeeperException.NoNodeException that it exposes, what would 
we want to do differently?  Since the point of stepDownAsLeader() is to step 
down, my opinion would be to ignore the NoNodeException if the znode has 
somehow been removed.  The logic in waitToBecomeLeader() is completely separate 
and clones the logic in ActiveMasterManager.blockUntilBecomingActiveMaster().

----
{quote}
In HBaseServer.closeCurrentConnection(), the following is added:

c = null;

Since c is the attachment to SelectionKey, should we call key.attach(null) ?
{quote}

Good point.  The {{c = null}} assignment is useless.  Changing to 
{{key.attach(null)}}.

----
{quote}
HBaseRPC.call() is marked deprecated. Probably we should mention 
SecureRpcEngine so that developers know what to look for.
{quote}

Added @depecated tag pointing to RpcEngine.call().
                
> Provide strong authentication with a secure RPC engine
> ------------------------------------------------------
>
>                 Key: HBASE-2742
>                 URL: https://issues.apache.org/jira/browse/HBASE-2742
>             Project: HBase
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Gary Helmling
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> The HBase RPC code (org.apache.hadoop.hbase.ipc.*) was originally forked off 
> of Hadoop RPC classes, with some performance tweaks added.  Those 
> optimizations have come at a cost in keeping up with Hadoop RPC changes 
> however, both bug fixes and improvements/new features.  
> In particular, this impacts how we implement security features in HBase (see 
> HBASE-1697 and HBASE-2016).  The secure Hadoop implementation (HADOOP-4487) 
> relies heavily on RPC changes to support client authentication via kerberos 
> and securing and mutual authentication of client/server connections via SASL. 
>  Making use of the built-in Hadoop RPC classes will gain us these pieces for 
> free in a secure HBase.
> So, I'm proposing that we drop the HBase forked version of RPC and convert to 
> direct use of Hadoop RPC, while working to contribute important fixes back 
> upstream to Hadoop core.  Based on a review of the HBase RPC changes, the key 
> divergences seem to be:
> HBaseClient:
>  - added use of TCP keepalive (HBASE-1754)
>  - made connection retries and sleep configurable (HBASE-1815)
>  - prevent NPE if socket == null due to creation failure (HBASE-2443)
> HBaseRPC:
>  - mapping of method names <-> codes (removed in HBASE-2219)
> HBaseServer:
>  - use of TCP keep alives (HBASE-1754)
>  - OOME in server does not trigger abort (HBASE-1198)
> HbaseObjectWritable:
>  - allows List<> serialization
>  - includes it's own class <-> code mapping (HBASE-328)
> Proposed process is:
> 1. open issues with patches on Hadoop core for important fixes/adjustments 
> from HBase RPC (HBASE-1198, HBASE-1815, HBASE-1754, HBASE-2443, plus a 
> pluggable ObjectWritable implementation in RPC.Invocation to allow use of 
> HbaseObjectWritable).
> 2. ship a Hadoop version with RPC patches applied -- ideally we should avoid 
> another copy-n-paste code fork, subject to ability to isolate changes from 
> impacting Hadoop internal RPC wire formats
> 3. if all Hadoop core patches are applied we can drop back to a plain vanilla 
> Hadoop version
> I realize there are many different opinions on how to proceed with HBase RPC, 
> so I'm hoping this issue will kick off a discussion on what the best approach 
> might be.  My own motivation is maximizing re-use of the authentication and 
> connection security work that's already gone into Hadoop core.  I'll put 
> together a set of patches around #1 and #2, but obviously we need some 
> consensus around this to move forward.  If I'm missing other differences 
> between HBase and Hadoop RPC, please list as well.  Discuss!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2742) Provide strong authentication with a secure RPC engine

Reply via email to