[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

Chris Nauroth (JIRA) Tue, 02 Feb 2016 09:35:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128629#comment-15128629
 ]


Chris Nauroth commented on HADOOP-12756:
----------------------------------------

Thank you for the proposal!  +1 to Steve's comments and a few more of my own:

# Regarding {{OSSFileSystem}}, please keep in mind that there are 2 parallel 
APIs for file system access now.  One is {{FileSystem}}, which you've already 
mentioned.  The other is {{FileContext}} (the user-facing API) which bridges to 
an implementation of {{AbstractFileSystem}} (the internal service provider's 
API).  For full integration, you'll want to provide an implementation for both 
of these APIs.  Most of the time, it's easy to provide an implementation of 
{{AbstractFileSystem}} by subclassing {{DelegateToFileSystem}} so that it does 
a passthrough to the {{FileSystem}} implementation you already wrote.
# Regarding this statement:
{quote}
Application access OSS through network, an additional proxy can be configured, 
all information can be set and passed to OSSFileSystem through Hadoop 
configuration.
{quote}
I don't think I understand this part.  Is the idea that the Hadoop client need 
not be configured with authentication credentials, and then the proxy would 
inject credentials before forwarding to Aliyun OSS?  If so, then is this proxy 
something that is planned as part of the code donation to Hadoop, or is the 
proxy an external component?
# Please make sure that credentials are integrated with our Credential Provider 
API.  There are more details on this in other JIRAs, and documentation is under 
way in HADOOP-11031.  The short story is that you just need to make sure 
sensitive credentials are read by using the {{Configuration#getPassword}} 
method.
# Since Aliyun OSS is an object store, I assume there must be some strategy for 
mapping the concept of hierarchical directories and files onto a flat key-value 
namespace.  It would help future maintainers if you could add details on the 
mapping strategy in the design document.  For an example, take a look at the 
PDF design document for the Azure file system attached to HADOOP-9629.
# Please plan on contributing end user documentation that is at least as 
detailed as the documentation for the existing object store integrations.  For 
examples, see 
[S3|http://hadoop.apache.org/docs/r2.7.2/hadoop-aws/tools/hadoop-aws/index.html],
 [Azure|http://hadoop.apache.org/docs/r2.7.2/hadoop-azure/index.html] and 
[Swift|http://hadoop.apache.org/docs/r2.7.2/hadoop-openstack/index.html].  It 
would be great to discuss what portions of the API are implemented and what is 
not implemented.  For example, many object store file systems choose not to 
implement append.  Discussion of semantics is important too.  For example, most 
of object store file systems differ from HDFS in that rename is not atomic.
# Regarding testability, Azure has support for running tests against a local 
emulator of the remote service.  (See the Azure doc page linked above for more 
details.)  This goes beyond mock-based testing so that it's an integration 
test.  It's not as realistic as connecting to the real service, but it can be a 
useful option for people who want to test without paying for an account or 
suffering long trans-continental latency.  Is there a similar emulation 
capability for Aliyun OSS?


> Incorporate Aliyun OSS file system implementation
> -------------------------------------------------
>
>                 Key: HADOOP-12756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12756
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: shimingfei
>            Assignee: shimingfei
>         Attachments: OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not 
> easy to access data laid on OSS storage from user’s Hadoop/Spark application, 
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration, 
> Spark/Hadoop applications can read/write data from OSS without any code 
> change. Narrowing the gap between user’s APP and data storage, like what have 
> been done for S3 in Hadoop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12756) Incorporate Aliyun OSS file system implementation

Reply via email to