[ 
https://issues.apache.org/jira/browse/HADOOP-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696921#action_12696921
 ] 

Todd Lipcon commented on HADOOP-4707:
-------------------------------------

Did some more hacking around in the code and came to some conclusions:

In order to guarantee that ThriftUtils.toThrift works on a DatanodeInfo 
properly, we need to ensure that the host name sent to 
NamenodePlugin.datanodeUp() is the same name as is in the DatanodeInfo. These 
DatanodeInfo names come from the DatanodeRegistration that gets sent from 
DN->NN at DN startup. Unfortunately, with the current setup this is broken for 
the following reason:
  - in FSNameSystem.register, the "name" field passed in by the DN is ignored 
in favor of taking the remote host out of the RPC stack. This host is then 
written back into the DatanodeRegistration which is then returned to the DN.
  - This means that, when the DN registers itself with the NameNode, its 
dnRegistration variable is mutated (specifically with regard to the name field)
  - In the current Plugin architecture, the DN thrift plugin has already 
started *before* the DN calls 'register". This means that when the 
DatanodePlugin calls the NamenodePlugin.datanodeUp function, it's passing a 
different "name" String than will eventually end up registered on the NN.

My proposed solution is:

1) Add a hook to the Datanode Plugin interface to allow the DN plugin to defer 
the datanodeUp call until after the DN has registered with the NN. This puts 
the dnRegistration member in a stable state that matches the DatanodeInfo 
stored on the NN side. The necessary plugin infrastructure for this is in 
HADOOP-5640, newly created.

2) This step isn't strictly necessary, but I'd like to modify the datanodeUp 
and datanodeDown functions to take three parameters (name, storage id, and 
thrift port) rather than the current two (name and thrift port). This allows 
those functions to construct DatanodeID objects and then maintain a 
Map<DatanodeID, Integer> thriftPorts array instead. The advantages here are:
  - Consistency with data structures elsewhere in the code
  - Clarity of what the map keys actually are
  - Some small level of "security", in the sense that the Storage ID makes 
these calls a little harder to spoof.

Alternatively, rather than taking three arguments, we could introduce a 
DatanodeID thrift struct with name and storageID members.

> Improvements to Hadoop Thrift bindings
> --------------------------------------
>
>                 Key: HADOOP-4707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4707
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/thiftfs
>    Affects Versions: 0.20.0
>         Environment: Tested under Linux x86-64
>            Reporter: Carlos Valiente
>            Priority: Minor
>         Attachments: all.diff, BlockManager.java, build_xml.diff, 
> DefaultBlockManager.java, DFSBlockManager.java, gen.diff, HADOOP-4707.diff, 
> HADOOP-4707.patch, hadoopfs_thrift.diff, hadoopthriftapi.jar, 
> HadoopThriftServer.java, HadoopThriftServer_java.diff, hdfs.py, 
> hdfs_py_venky.diff, libthrift.jar, libthrift.jar, libthrift.jar
>
>
> I have made the following changes to hadoopfs.thrift:
> #  Added namespaces for Python, Perl and C++.
> # Renamed parameters and struct members to camelCase versions to keep them 
> consistent (in particular FileStatus{blockReplication,blockSize} vs 
> FileStatus.{block_replication,blocksize}).
> # Renamed ThriftHadoopFileSystem to FileSystem. From the perspective of a 
> Perl/Python/C++ user, 1) it is already clear that we're using Thrift, and 2) 
> the fact that we're dealing with Hadoop is already explicit in the namespace. 
>  The usage of generated code is more compact and (in my opinion) clearer:
> {quote}
>         *Perl*:
>         use HadoopFS;
>         my $client = HadoopFS::FileSystemClient->new(..);
>          _instead of:_
>         my $client = HadoopFS::ThriftHadoopFileSystemClient->new(..);
>         *Python*:
>         from hadoopfs import FileSystem
>         client = FileSystem.Client(..)
>         _instead of_
>         from hadoopfs import ThriftHadoopFileSystem
>         client = ThriftHadoopFileSystem.Client(..)
>         (See also the attached diff [^scripts_hdfs_py.diff] for the
>          new version of 'scripts/hdfs.py').
>         *C++*:
>         hadoopfs::FileSystemClient client(..);
>          _instead of_:
>         hadoopfs::ThriftHadoopFileSystemClient client(..);
> {quote}
> # Renamed ThriftHandle to FileHandle: As in 3, it is clear that we're dealing 
> with a Thrift object, and its purpose (to act as a handle for file 
> operations) is clearer.
> # Renamed ThriftIOException to IOException, to keep it simpler, and 
> consistent with MalformedInputException.
> # Added explicit version tags to fields of ThriftHandle/FileHandle, Pathname, 
> MalformedInputException and ThriftIOException/IOException, to improve 
> compatibility of existing clients with future versions of the interface which 
> might add new fields to those objects (like stack traces for the exception 
> types, for instance).
> Those changes are reflected in the attachment [^hadoopfs_thrift.diff].
> Changes in generated Java, Python, Perl and C++ code are also attached in 
> [^gen.diff]. They were generated by a Thrift checkout from trunk
> ([http://svn.apache.org/repos/asf/incubator/thrift/trunk/]) as of revision
> 719697, plus the following Perl-related patches:
> * [https://issues.apache.org/jira/browse/THRIFT-190]
> * [https://issues.apache.org/jira/browse/THRIFT-193]
> * [https://issues.apache.org/jira/browse/THRIFT-199]
> The Thrift jar file [^libthrift.jar] built from that Thrift checkout is also 
> attached, since it's needed to run the Java Thrift server.
> I have also added a new target to src/contrib/thriftfs/build.xml to build the 
> Java bindings needed for org.apache.hadoop.thriftfs.HadoopThriftServer.java 
> (see attachment [^build_xml.diff] and modified HadoopThriftServer.java to 
> make use of the new bindings (see attachment [^HadoopThriftServer_java.diff]).
> The jar file [^lib/hadoopthriftapi.jar] is also included, although it can be 
> regenerated from the stuff under 'gen-java' and the new 'compile-gen' Ant 
> target.
> The whole changeset is also included as [^all.diff].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to