[jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS

Pete Wyckoff (JIRA) Tue, 05 Aug 2008 09:31:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619947#action_12619947
 ]


Pete Wyckoff commented on HADOOP-3754:
--------------------------------------

+1

but, a few more nits:

1. I do think that requiring people to download and compile thrift will be too 
much of a hassle given that the compiler is in C++ so checking in the generated 
code, really is the way to go - I think :)  And of course, this requires 
checking in the needed libraries in various langauges - the libthrift,.jar, 
thrift.so, thrift.py, ...  But, we can require it, it just makes it more of a 
hassle for the user, but in this case, I think we need to have a README that 
tells people how to do that. Also, why do we need to check in the 
limited_relection header if the user has to download thrift??

2. The exceptions thrown by the library are very general and do not match the 
client lib - e.g., IOException, ... although this could be a later add on.

3. A note saying the chown is not atomic - i.e., the group in theory could 
change between the get and the set

4. I think copy from local would be more robust if one could optionally add a 
checksum so the server could ensure it's looking at the right file and if not 
and/or the path does not exist, a meaningful exception is thrown but again 
could be a later add on

5. Not needed now, but the command line isn't very robust to errors or friendly 
about printing them out in a meaningful user friends way.

6. Generally a README that explains what this is and/or a bigger release note.

7. Not now, but I would be super, super interested in knowing the performance 
of read/writes from this server.

8. as we saw with the metastore, it would be cool to have an optional #of 
minimum threads in the worker pool.

9. I don't quite understand why src/contrib/build-contrib.xml needs to change 
for adding this??

10. would be better to inherit from thrift/src/contrib/fb303 but could be done 
later and then include counts for each operation. 

But, this is a killer application since no Java or Hadoop is needed on the 
client whatsoever! Congratulations! Would be cool even to use the Java bindings 
from a thin client to show no need for all of hadoop.

I would really, really love to see:

List<BlockAddresses> readBlocks(string filename) throws IOException ;
List<BlockAddresses> writeBlocks(string filename, i64 length) throws 
IOException;

which give you access to reading/writing directly from the data node over TCP :)

Overall looks very good on the first cut.

pete



> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
>                 Key: HADOOP-3754
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3754
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code 
> generation for a variety of languages (Java, C++, python, PHP, etc) It would 
> be nice if HDFS APIs are exposed though Thirft. It will allow applications 
> written in any programming language to access HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS

Reply via email to