[ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836483#comment-15836483
 ] 

Daryn Sharp commented on HADOOP-13836:
--------------------------------------

SSL is notoriously hard for non-blocking io.  The added org.baswerc.niossl 
library appears dead and unsupported.  V0.2 was posted 1.5y ago.  There are a 
handful of open bugs about memory leaks, selectors not being reliable, etc.  No 
responses to the bugs.  Not encouraging.

Regarding the Connection subclasses. The readAndProcess method is already a bit 
dicey.  It embodies authentication handshake and general rpc message reading 
and queuing.  I'm hesitant of two different impls because it’s likely lead to 
unintended divergence, as illustrated below, but also increasing the chance of 
security holes.  Ideally the ssl channel impl should be transparent and not 
require changes to readAndProcess.

Verifying correctness of partial reads is a bit difficult.  The position within 
the byte[] appBufBytes, which is extracted from ByteBuffer appBuf, is being 
tracked via appBuf’s position and repeatedly updated.  The extraction seems 
unnecessary and explicitly position update seem unnecessary.  It’d be easier to 
follow if applicationBufferRead() took a source/dest byte buffer, copied up to 
dest’s remaining from the source, updated the position.  That said…

The ssl readAndProcess behavior isn’t equivalent to the current NIO behavior: 
read only what’s available, process request when fully read.  If the ssl 
version encounters a partial payload, it loops until at least the full payload 
is read into appBuf.  If appBuf isn’t fully consumed it loops again.  This 
causes problems that NIO is avoiding:
* Multi-threaded clients generating requests faster than read will indefinitely 
tie up a reader.
* Clients sending a slow trickle of bytes will tie up a reader until a request 
is fully read.
* Clients stalled mid-request will cause the reader to go into a spin loop.

When the reader loops on a connection, the reader’s other established 
connections are starved.  The reader also isn’t consuming new connections 
queued by the listener.  Eventually the listener will block and stop accepting. 
 The result is the ipc layer going into a series of seizures that severely 
degrade performance.  This may be partly responsible for the performance 
degradation.

Another issue is the all or nothing requirement for enabling ssl.  I’d be 
potentially interested in using ssl if I could configure which hosts require 
ssl, ie. intra-colo.  Sasl qop impl allows the server to selectively control if 
clients are forced to encrypt.  The same would be nice for ssl.

Lastly, the 11-14% performance is unacceptable for production use.  I would 
have expected terasort’s heavy cpu usage to eclipse the ssl rpc load.  It 
didn’t, which implies that less cpu-intensive jobs will have a more pronounced 
hit?

> Securing Hadoop RPC using SSL
> -----------------------------
>
>                 Key: HADOOP-13836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13836
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>            Reporter: kartheek muthyala
>            Assignee: kartheek muthyala
>         Attachments: HADOOP-13836.patch, HADOOP-13836-v2.patch, 
> HADOOP-13836-v3.patch, HADOOP-13836-v4.patch, Secure IPC OSS Proposal-1.pdf, 
> SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.    Plain 
> 2.     SASL encryption with an underlying authentication
> 3.     SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to