[ 
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065208#comment-14065208
 ] 

Arpit Agarwal commented on HDFS-6699:
-------------------------------------

Hi Remus, an alternative approach you may have already considered is named file 
mapping objects via {{CreateFileMapping}}. By scoping the security descriptor 
the DataNode can theoretically restrict access to just the client user.

The advantage is that NameNode and any other processes don't need to act as 
brokers.

> Secure Windows DFS read when client co-located on nodes with data 
> (short-circuit reads)
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-6699
>                 URL: https://issues.apache.org/jira/browse/HDFS-6699
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, hdfs-client, performance, security
>            Reporter: Remus Rusanu
>              Labels: windows
>
> HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain 
> sockets. Similar capability can be introduced in a secure Windows environment 
> using 
> [DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
>  Win32 API. When short-circuit is allowed the datanode would open the block 
> file and then duplicate the handle into the hdfs client process and return to 
> the process the handle value. The hdfs client can then open a Java stream on 
> this handle and read the file. This is a secure mechanism, the HDFS acls are 
> validated by the namenode and the process does not gets direct access to the 
> file in a controlled manner (eg. read-only). The hdfs client process does not 
> need to have OS level access privilege to the block file.
> A complication arises from the requirement to duplicate the handle in the 
> hdfs client process. Ordinary processes (as we desire datanode to run) do not 
> have the required privilege (SeDebugPrivilege). But with introduction of an 
> elevated service helper for the nodemanager Windows Secure Container Executor 
> (YARN-2198) we have at our disposal an elevated executor that can do the job 
> of duplicating the handle. The datanode would communicate with this process 
> using the same mechanism as the nodemanager, ie. LRPC.
> With my proposed implementation the sequence of actions is as follows:
>  - the hdfs client requests Windows secure shortcircuit of a block in the 
> data transfer protocol. It passes the block, the token and its own process ID.
>  - datanode approves short-circuit. It opens the block file and obtains the 
> handle.
>  - datanode invokes the elevated privilege service to duplicate the handle 
> into the hdfs client process. datanode invokes the service LRPC interface 
> over JNI (LRPC being the Windows de-facto standard for interoperating with a 
> service). It passes the handle valeu, its own process id and the hdfs client 
> process id. 
>  - The elevated service duplicates the handle from the datanode process into 
> the hdfs client proces. It returns the duplicate handle value to the datanode 
> as output value from the LRPC call
>  - x 2 for CRC file
>  - the datanode responds to the short circuit datatransfer protocol request 
> with a message that contains the duplicate handle value (handles actually, x2 
> from CRC)
>  - the hdfs-client creates a Java stream that wraps the handles and reads the 
> block from this stream (ditto for CRC)
> datanode needs to exercise care not to duplicate the same handle to different 
> clients (including the CRC handles) because a handle abstracts also the file 
> position and clients would inadvertently move each other file pointer to 
> chaos results.
> TBD a mitigation for process ID reuse (the hdfs client can be terminated 
> immediately after the block request and a new process could reuse the same 
> ID) . In theory an attacker could use this as a mechanism to obtain a handle 
> to a block by killing the hdfs-client at the right moment and swing new 
> processes until it gets one with the desired ID. I'm not sure is a realistic 
> threat because the attacker already must have the privilege to kill the hdfs 
> client process, and having such privilege he could obtain the handle by other 
> means (eg. debug/inspect hdfs client process). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to