[ 
https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672753#action_12672753
 ] 

dhruba borthakur commented on HADOOP-4707:
------------------------------------------

Hi Carlos, This looks awesome! Lots of cool stuff. I have not looked at the 
details, but here are some of my initial comments:

1. The idea of making the Namenode expose a Thrift interface is a great idea. 
This removes the need for a separate daemon to map Thrift methods to 
ClientProcotol methods.

2. Is there a way to "layer" the Thrift layer (DatanodeThriftServer and 
NameNodeThrift server) around the org.apache.hadoop.hdfs package (instead of 
making them part of org.apache.hadoop.hdfs). Can these be part of 
org.apache.hadoop.thriftfs and reside in the src/contrib/thriftfs directory? 
This essentially means that the Thrift server is a "layer" then encapsulates 
the hadoop Namenode/DataNode.  This could be in its own library called 
libhdfsthrift.jar (or something like that)

3. There could be new source files in the base hdfs package that allows 
plugging in of multiple protocol stacks (the first one being Thrift). This code 
could use reflection (and configuration settings) to figure out if the 
libhdfsthrift.jar is in the classpath, and if so, then use those methods from 
that jar to initialize the Thrift server. The reason I propose the above is 
because it does not force every Hadoop install to use Thrift. It keeps the base 
Namenode/Datanode code clean and elegent. It also allows plugging in other 
protocol stacks to expose the Namenode/datanode functionality.

4. It is possible that the Thrift implementation might need to use some 
(currently) package-private methods in the Datanode/Namenode, but we can work 
on making them public if need be.

5. Allowing the Thrift interface to read file contents is easy. However, 
writing to blocks is more difficult, especially because the DFSClient.java is a 
heavy-weight piece of code and participates heavily in ensuring correct 
recovery from write-pipeline failures, allows "appending" to existing files, 
ensuring all blocks of a file are equal size, etc.etc. Do you have an 
application that will need to write to HDFS files using this Thrift interface?

6. Your unit tests are nice. It is imperative for us to detect incompatible 
changes to base HDFS APIs earlier rather than later.




> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, 
> DefaultBlockManager.java, DFSBlockManager.java, gen.diff, HADOOP-4707.diff, 
> hadoopfs_thrift.diff, hadoopthriftapi.jar, HadoopThriftServer.java, 
> HadoopThriftServer_java.diff, hdfs.py, hdfs_py_venky.diff, libthrift.jar, 
> libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them 
> consistent (in particular FileStatus{blockReplication,blockSize} vs 
> FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a 
> Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2) 
> the fact that we're dealing with Hadoop is already explicit in the namespace. 
>  The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing 
> with a Thrift object, and its purpose (to act as a handle for file 
> operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and 
> consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, 
> MalformedInputException and ThriftIOException/IOException, to improve 
> compatibility of existing clients with future versions of the interface which 
> might add new fields to those objects (like stack traces for the exception 
> types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in 
> [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also 
> attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the 
> Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java 
> (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to 
> make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be 
> regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant 
> target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to