[ 
https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704320#action_12704320
 ] 

Todd Lipcon commented on HADOOP-4707:
-------------------------------------

Some more notes after using this API for a bit:

- The Block struct was missing the startOffset from LocatedBlock. This made it 
somewhat useful for actually reading out of the middle of files. I'll submit a 
patch for this once I aggregate a few more changes.
- I think we should think a little bit about authentication. Currently the 
interface uses identd to determine the unix username of the connecting socket, 
but this has several problems:
-- identd doesn't run on a lot of systems
-- This doesn't really provide any more security than the default hadoop model 
of "trust that you are who you say you are"
-- This is problematic if you *need* to be able to spoof other user roles. For 
the example of a web UI, the authentication would happen on that layer, and 
even though the thrift connections would be coming from "www-data" the actual 
access should assume the role of the authenticated end user for all RPCs

Regarding the authentication issue, I would propose a somewhat major change. We 
should add a struct called CallContext that looks something like this:

{code}
struct CallContext {
  1:string user
  2:list<string> groups
}
{code}

We should then add this as a parameter to every RPC, at least in the NameNode 
service. Since Thrift uses id numbers for parameters, this will safely end up 
as null for any callers that are using the old interface. We should then add 
something to ThriftUtils along the lines of:

{code}
CallContext completeCallContext(CallContext ctx) {
  if (null == ctx) {
    ctx = new CallContext();
    ... // get call context using identd as it does currently;
  }
  return ctx
}
{code}

With this code, any existing clients preserve their identd behavior with no 
issues.

If people are concerned with the lack of security here, I would make the 
following points:
- This is the status quo for security in Hadoop - none of the IPC protocols 
have any kind of strong authentication
- The current Thrift interfaces don't check UGI at the datanode level anyway
- identd is easily hackable in the first place
- using a generic CallContext struct for all calls would easily allow us to 
extend the struct later with some kind of authorization token to provide strong 
authentication when such a system is devised elsewhere in Hadoop

> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Assignee: Carlos Valiente
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, 
> DefaultBlockManager.java, DFSBlockManager.java, gen.diff, 
> HADOOP-4707-55c046a.txt, hadoop-4707-6bc958.txt, HADOOP-4707.diff, 
> HADOOP-4707.patch, HADOOP-4707.patch, hadoopfs_thrift.diff, 
> hadoopthriftapi.jar, HadoopThriftServer.java, HadoopThriftServer_java.diff, 
> hdfs.py, hdfs_py_venky.diff, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them 
> consistent (in particular FileStatus{blockReplication,blockSize} vs 
> FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a 
> Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2) 
> the fact that we're dealing with Hadoop is already explicit in the namespace. 
>  The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing 
> with a Thrift object, and its purpose (to act as a handle for file 
> operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and 
> consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, 
> MalformedInputException and ThriftIOException/IOException, to improve 
> compatibility of existing clients with future versions of the interface which 
> might add new fields to those objects (like stack traces for the exception 
> types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in 
> [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also 
> attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the 
> Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java 
> (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to 
> make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be 
> regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant 
> target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to