[
https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620164#action_12620164
]
dhruba borthakur commented on HADOOP-3754:
------------------------------------------
Thanks Pete & Nitay for the detailed comments. Thanks a bunch.
1. The patch includes the thrift binary for Linux. See lib/thrift/thrift and
lib/thrift/libthrift.jar. Thus, a Linux compile does not have to download any
external libraries, utilities.
2. The proxy server uses the message from the hadoop.IOException to create its
own exception. This is the best we can do for now. If we want to improve it
later, we can do that. The application would see the real exception string, so
it shoudl be enough for debugging purposes, won't it?
3. Added a note to chown to say that it is not-atomic. This is true for hdfs.py
only and does not apply to the chown thrift interface.
4. I like your idea of using the checksum all the way from the client, but
maybe we can postpone it to a later date.
5. The python command line needs more work. However, I am not targeting the
python wrapper as a piece that an application will use as it is. It is there to
demonstrate how to access HDFS from a python script. I
6. Added README that describes the approach, build and deployment process. I
plan on writing a Wiki page once this patch gets committed.
7. performance measurement will come at a later date
8. Added default minimum number of threads to be 10.
9. The change to build-contrib.xml ensures that the generated jar file(s) are
in the CLASSPATH while compiling HadoopThriftServer.java.
10. I would wait to include fb303. This is mostly for statistics management and
process management and can be added at a later date. It might be useful to use
HadoopMetrics or via HADOOP-3772.
11. I added a new call setInactiveTimeoutPeriod() that allows an application to
specify how long the proxy server should remain active starting from the last
call to it. If this timer expires, then the proxy server closes all open files
and shuts down. The default inactivity timeout is 1 hour. This does not
completely address Nitay's problems, but maybe solves it to a certain extent.
If Nitay could merge in his code for per-handle timer once this patch is
committed, that will be great.
12. If, at a future time, we add Thrift APIs to Namenode, Datanode, etc, they
would have to be located in src/hdfs and not in contrib. Even if we decide to
keep them in contrib, they could be src/contrib/thriftfs/namenode,
src/contrib/thriftfs/datanode, etc. I think the API in this patch should try to
resemble existing API in fs.FileSystem.
13. I added a getFileBlockLocations API to allow fetching the block locations
of a file.
> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
> Key: HADOOP-3754
> URL: https://issues.apache.org/jira/browse/HADOOP-3754
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code
> generation for a variety of languages (Java, C++, python, PHP, etc) It would
> be nice if HDFS APIs are exposed though Thirft. It will allow applications
> written in any programming language to access HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.