Remus Rusanu created HDFS-6699:
----------------------------------

             Summary: Secure Windows DFS read when client co-located on nodes 
with data (short-circuit reads)
                 Key: HDFS-6699
                 URL: https://issues.apache.org/jira/browse/HDFS-6699
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode, hdfs-client, performance, security
            Reporter: Remus Rusanu


HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain 
sockets. Similar capability can be introduced in a secure Windows environment 
using 
[DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
 Win32 API. When short-circuit is allowed the datanode would open the block 
file and then duplicate the handle into the hdfs client process and return to 
the process the handle value. The hdfs client can then open a Java stream on 
this handle and read the file. This is a secure mechanism, the HDFS acls are 
validated by the namenode and the process does not gets direct access to the 
file in a controlled manner (eg. read-only). The hdfs client process does not 
need to have OS level access privilege to the block file.

A complication arises from the requirement to duplicate the handle in the hdfs 
client process. Ordinary processes (as we desire datanode to run) do not have 
the required privilege (SeDebugPrivilege). But with introduction of an elevated 
service helper for the namenode Windows Secure Container Executor (YARN-2198) 
we have at our disposal an elevated executor that can do the job of duplicating 
the handle. the namenode would communicate with this process using the same 
mechanism as the nodemanager, ie. LRPC.

With my proposed implementation the sequence of actions is as follows:

 - the hdfs client requests Windows secure shortcircuit of a block in the data 
transfer protocol. It passes the block, the token and its own process ID.
 - datanode approves short-circuit. It opens the block file and obtains the 
handle.
 - datanode invokes the elevated privilege service to duplicate the handle into 
the hdfs client process. datanode invokes the service LRPC interface over JNI 
(LRPC being the Windows de-facto standard for interoperating with a service). 
It passes the handle valeu, its own process id and the hdfs client process id. 
 - The elevated service duplicates the handle from the datanode process into 
the hdfs client proces. It returns the duplicate handle value to the datanode 
as output value from the LRPC call
 - x 2 for CRC file
 - the datanode responds to the short circuit datatransfer protocol request 
with a message that contains the duplicate handle value (handles actually, x2 
from CRC)
 - the hdfs-client creates a Java stream that wraps the handles and reads the 
block from this stream (ditto for CRC)

datanode needs to exercise care not to duplicate the same handle to different 
clients (including the CRC handles) because a handle abstracts also the file 
position and clients would inadvertently move each other file pointer to chaos 
results.

TBD a mitigation for process ID reuse (the hdfs client can be terminated 
immediately after the block request and a new process could reuse the same ID) 
. In theory an attacker could use this as a mechanism to obtain a handle to a 
block by killing the hdfs-client at the right moment and swing new processes 
until it gets one with the desired ID. I'm not sure is a realistic threat 
because the attacker already must have the privilege to kill the hdfs client 
process, and having such privilege he could obtain the handle by other means 
(eg. debug/inspect hdfs client process). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to