[
https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619947#action_12619947
]
Pete Wyckoff commented on HADOOP-3754:
--------------------------------------
+1
but, a few more nits:
1. I do think that requiring people to download and compile thrift will be too
much of a hassle given that the compiler is in C++ so checking in the generated
code, really is the way to go - I think :) And of course, this requires
checking in the needed libraries in various langauges - the libthrift,.jar,
thrift.so, thrift.py, ... But, we can require it, it just makes it more of a
hassle for the user, but in this case, I think we need to have a README that
tells people how to do that. Also, why do we need to check in the
limited_relection header if the user has to download thrift??
2. The exceptions thrown by the library are very general and do not match the
client lib - e.g., IOException, ... although this could be a later add on.
3. A note saying the chown is not atomic - i.e., the group in theory could
change between the get and the set
4. I think copy from local would be more robust if one could optionally add a
checksum so the server could ensure it's looking at the right file and if not
and/or the path does not exist, a meaningful exception is thrown but again
could be a later add on
5. Not needed now, but the command line isn't very robust to errors or friendly
about printing them out in a meaningful user friends way.
6. Generally a README that explains what this is and/or a bigger release note.
7. Not now, but I would be super, super interested in knowing the performance
of read/writes from this server.
8. as we saw with the metastore, it would be cool to have an optional #of
minimum threads in the worker pool.
9. I don't quite understand why src/contrib/build-contrib.xml needs to change
for adding this??
10. would be better to inherit from thrift/src/contrib/fb303 but could be done
later and then include counts for each operation.
But, this is a killer application since no Java or Hadoop is needed on the
client whatsoever! Congratulations! Would be cool even to use the Java bindings
from a thin client to show no need for all of hadoop.
I would really, really love to see:
List<BlockAddresses> readBlocks(string filename) throws IOException ;
List<BlockAddresses> writeBlocks(string filename, i64 length) throws
IOException;
which give you access to reading/writing directly from the data node over TCP :)
Overall looks very good on the first cut.
pete
> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
> Key: HADOOP-3754
> URL: https://issues.apache.org/jira/browse/HADOOP-3754
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code
> generation for a variety of languages (Java, C++, python, PHP, etc) It would
> be nice if HDFS APIs are exposed though Thirft. It will allow applications
> written in any programming language to access HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.