[ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855409#comment-15855409
 ] 

kartheek muthyala commented on HADOOP-13836:
--------------------------------------------

[~daryn], Thank you for the insightful feedback. :)

When SSL encrypts the databuffers, the length of the data packets differ from 
the actual data sent. For example, if we have a 10 byte data packet, after 
encryption - the data packet can grow up to 16 byte depending on the algorithm 
used for encryption. So, when a hadoop RPC is sent on a channel, we read the 
data length to get to know the data to be read in advance. So, in the current 
readAndProcess, when we replace the socket channel with SSLServerSocketChannel, 
the channelRead might read partial data, which might not be able to sense the 
data length or data. For example, when we call SSLSocketChannel.read() might 
yield only 3 bytes, even though it has read 8 bytes on the channel. These 3 
bytes won't be able to decode the data length, because today we use 4 bytes to 
understand the data length. So this nature of varying datalength on the 
channel, made me to modify the readAndProcess to continuously loop until we 
have enough data. This can probably be simplified by having another class which 
extends SSLServerSocketChannel and buffers at a layer under readAndProcess. 
That might avoid the extra readAndProcess. I will create an improvement on top 
of this jira to verify if that abstraction is possible. But even with this 
extra interface, we still have to loop for the data because of the same data 
length issues.


Multi-threaded clients generating requests faster than read will indefinitely 
tie up a reader
- I am not sure if it gets indefinitely tied up, but they will get processed 
eventually.
Clients sending a slow trickle of bytes will tie up a reader until a request is 
fully read.
- This is a problem that exists still today, when large data packets are sent 
and we use ChannelIO on the server to process this. 
Clients stalled mid-request will cause the reader to go into a spin loop.
- The connection timeout on the stalled clients, would lead to closure of 
channel and the spin loop breaks.


[~wheat9], The performance study quoted in the link occurs on a setup where 
clients are interfacing with frontend machines which support HTTPS. They 
pointed out that "On our production frontend machines, SSL/TLS accounts for 
less than 1% of the CPU load, less than 10KB of memory per connection and less 
than 2% of network overhead.", so it is an overall 3% overall for them too 
including network overhead due to handshaking. I am not sure if this is an 
Apple to Apple comparison with the setup on which I have taken performance 
numbers. The CPU processing speed in decoding and encoding, SSL protocol used, 
network bandwidth between the machines and workload characteristics etc.. might 
have varied in both the setups. 

> Securing Hadoop RPC using SSL
> -----------------------------
>
>                 Key: HADOOP-13836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13836
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>            Reporter: kartheek muthyala
>            Assignee: kartheek muthyala
>         Attachments: HADOOP-13836.patch, HADOOP-13836-v2.patch, 
> HADOOP-13836-v3.patch, HADOOP-13836-v4.patch, Secure IPC OSS Proposal-1.pdf, 
> SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.    Plain 
> 2.     SASL encryption with an underlying authentication
> 3.     SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to