[ 
https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620164#action_12620164
 ] 

dhruba borthakur commented on HADOOP-3754:
------------------------------------------

Thanks Pete & Nitay for the detailed comments. Thanks a bunch.

1. The patch includes the thrift binary for Linux. See lib/thrift/thrift and 
lib/thrift/libthrift.jar. Thus, a Linux compile does not have to download any 
external libraries, utilities.

2. The proxy server uses the message from the hadoop.IOException to create its 
own exception. This is the best we can do for now. If we want to improve it 
later, we can do that. The application would see the real exception string, so 
it shoudl be enough for debugging purposes, won't it?

3. Added a note to chown to say that it is not-atomic. This is true for hdfs.py 
only and does not apply to the chown thrift interface.

4. I like your idea of using the checksum all the way from the client, but 
maybe we can postpone it to a later date.

5. The python command line needs more work. However, I am not targeting the 
python wrapper as a piece that an application will use as it is. It is there to 
demonstrate how to access HDFS from a python script. I

6. Added README that describes the approach, build and deployment process. I 
plan on writing a Wiki page once this patch gets committed.

7. performance measurement will come at a later date

8. Added default minimum number of threads to be 10.

9. The change to build-contrib.xml ensures that the generated jar file(s) are 
in the CLASSPATH while compiling HadoopThriftServer.java.

10. I would wait to include fb303. This is mostly for statistics management and 
process management and can be added at a later date. It might be useful to use 
HadoopMetrics or via HADOOP-3772. 

11. I added a new call setInactiveTimeoutPeriod() that allows an application to 
specify how long the proxy server should remain active starting from the last 
call to it. If this timer expires, then the proxy server closes all open files 
and shuts down. The default inactivity timeout is 1 hour. This does not 
completely address Nitay's problems, but maybe solves it to a certain extent. 
If Nitay could merge in his code for per-handle timer once this patch is 
committed, that will be great.

12. If, at a future time, we add Thrift APIs to Namenode, Datanode, etc, they 
would have to be located in src/hdfs and not in contrib.  Even if we decide to 
keep them in contrib, they could be src/contrib/thriftfs/namenode, 
src/contrib/thriftfs/datanode, etc. I think the API in this patch should try to 
resemble existing API in fs.FileSystem.

13. I added a getFileBlockLocations API to allow fetching the block locations 
of a file.


> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
>                 Key: HADOOP-3754
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3754
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code 
> generation for a variety of languages (Java, C++, python, PHP, etc) It would 
> be nice if HDFS APIs are exposed though Thirft. It will allow applications 
> written in any programming language to access HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to