[
https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carlos Valiente updated HADOOP-4707:
------------------------------------
Attachment: libthrift.jar
HADOOP-4707.patch
Patch [^HADOOP-4707.patch] provides Thrift interfaces to HDFS namenodes
and datanodes using HDFS service plugins. Both
plugins implement the Thrift services
defined in {{src/contrib/thriftfs/if/hdfs.thrift}}.
In order to read data from a file, Thrift clients request a list
of blocks with {{Namenode.getBlocks(path, offset, length)}}, and then
call {{Datanode.readBlock()}} on the appropriate datanode servers
for each block in the returned list. The Thrift datanode server
instance then opens a local socket to the datanode server via
{{org.apache.hadoop.hdfs.DFSClient.BlockReader.newBlockReader()}}.
Both plugins add {{thriftfs-default.xml}} and
{{thriftfs-site.xml}} as configuration resources. The following
properties define the addresses for
the Thrift servers:
* dfs.thrift.address, set by default to '0.0.0.0:9090'
* dfs.thrift.datanode.address, set by default to '0.0.0.0:0'.
The following properties limit the number and lifetime of Thrift server
threads:
* dfs.thrift.threads.min, set by default to 5
* dfs.thrift.threads.max, set by default to 20,
* dfs.thrift.timeout, set by default to 60 seconds.
Since {{Datanode.readBlock()}} expects data blobs
of Thrift type 'binary', and that type translates to Java's {{byte[]}},
data reads are limited to 2**31 -1 bytes (whereas Hadoop
blocks may be much larger, since their length is measured in longs).
On Dhruba's suggestion, I've removed all write-related methods for now.
Thrift namenode and datanode servers try to obtain the identity of
the client by calling
{{org.apache.hadoop.thriftfs.PluginBase.getRemoteUser()}},
which implements the IDENT protocol defined by RFC 1413. If that call
fails, the value returned by
{{security.UnixUserGroupInformation}}
is used instead.
I've removed the Perl and Python high-level APIs from this patch in
order to make it simpler. Those APIs are available at
http://code.pepelabs.net/git/?p=hadoop-thrift.git. Perhaps it's better
to keep them separate from Hadoop's code base?
I've updated Thrift's [^libthrift.jar] to a recent Subversion checkout.
It seems that a Thrift release is imminent, so the final JAR (and the
Java code it generates) should not be too different from what's included
here.
> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
> Key: HADOOP-4707
> URL: https://issues.apache.org/jira/browse/HADOOP-4707
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/thiftfs
> Affects Versions: 0.20.0
> Environment: Tested under Linux x86-64
> Reporter: Carlos Valiente
> Priority: Minor
> Attachments: all.diff, BlockManager.java, build_xml.diff,
> DefaultBlockManager.java, DFSBlockManager.java, gen.diff, HADOOP-4707.diff,
> HADOOP-4707.patch, hadoopfs_thrift.diff, hadoopthriftapi.jar,
> HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py,
> hdfs_py_venky.diff, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> # Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them
> consistent (in particular FileStatus{blockReplication,blockSize} vs
> FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a
> Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2)
> the fact that we're dealing with Hadoop is already explicit in the namespace.
> The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
> *Perl*:
> use HadoopFS;
> my $client = HadoopFS::FileSystemClient->new(..);
> _instead of:_
> my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
> *Python*:
> from hadoopfs import FileSystem
> client = FileSystem.Client(..)
> _instead of_
> from hadoopfs import ThriftHadoopFileSystem
> client = ThriftHadoopFileSystem.Client(..)
> (See also the attached diff [^scripts_hdfs_py.diff] for the
> new version of 'scripts/hdfs.py').
> *C++*:
> hadoopfs::FileSystemClient client(..);
> _instead of_:
> hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing
> with a Thrift object, and its purpose (to act as a handle for file
> operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and
> consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname,
> MalformedInputException and ThriftIOException/IOException, to improve
> compatibility of existing clients with future versions of the interface which
> might add new fields to those objects (like stack traces for the exception
> types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in
> [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also
> attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the
> Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java
> (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to
> make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be
> regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant
> target.
> The whole changeset is also included as [^all.diff].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.