[
https://issues.apache.org/jira/browse/HADOOP-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128629#comment-15128629
]
Chris Nauroth commented on HADOOP-12756:
----------------------------------------
Thank you for the proposal! +1 to Steve's comments and a few more of my own:
# Regarding {{OSSFileSystem}}, please keep in mind that there are 2 parallel
APIs for file system access now. One is {{FileSystem}}, which you've already
mentioned. The other is {{FileContext}} (the user-facing API) which bridges to
an implementation of {{AbstractFileSystem}} (the internal service provider's
API). For full integration, you'll want to provide an implementation for both
of these APIs. Most of the time, it's easy to provide an implementation of
{{AbstractFileSystem}} by subclassing {{DelegateToFileSystem}} so that it does
a passthrough to the {{FileSystem}} implementation you already wrote.
# Regarding this statement:
{quote}
Application access OSS through network, an additional proxy can be configured,
all information can be set and passed to OSSFileSystem through Hadoop
configuration.
{quote}
I don't think I understand this part. Is the idea that the Hadoop client need
not be configured with authentication credentials, and then the proxy would
inject credentials before forwarding to Aliyun OSS? If so, then is this proxy
something that is planned as part of the code donation to Hadoop, or is the
proxy an external component?
# Please make sure that credentials are integrated with our Credential Provider
API. There are more details on this in other JIRAs, and documentation is under
way in HADOOP-11031. The short story is that you just need to make sure
sensitive credentials are read by using the {{Configuration#getPassword}}
method.
# Since Aliyun OSS is an object store, I assume there must be some strategy for
mapping the concept of hierarchical directories and files onto a flat key-value
namespace. It would help future maintainers if you could add details on the
mapping strategy in the design document. For an example, take a look at the
PDF design document for the Azure file system attached to HADOOP-9629.
# Please plan on contributing end user documentation that is at least as
detailed as the documentation for the existing object store integrations. For
examples, see
[S3|http://hadoop.apache.org/docs/r2.7.2/hadoop-aws/tools/hadoop-aws/index.html],
[Azure|http://hadoop.apache.org/docs/r2.7.2/hadoop-azure/index.html] and
[Swift|http://hadoop.apache.org/docs/r2.7.2/hadoop-openstack/index.html]. It
would be great to discuss what portions of the API are implemented and what is
not implemented. For example, many object store file systems choose not to
implement append. Discussion of semantics is important too. For example, most
of object store file systems differ from HDFS in that rename is not atomic.
# Regarding testability, Azure has support for running tests against a local
emulator of the remote service. (See the Azure doc page linked above for more
details.) This goes beyond mock-based testing so that it's an integration
test. It's not as realistic as connecting to the real service, but it can be a
useful option for people who want to test without paying for an account or
suffering long trans-continental latency. Is there a similar emulation
capability for Aliyun OSS?
> Incorporate Aliyun OSS file system implementation
> -------------------------------------------------
>
> Key: HADOOP-12756
> URL: https://issues.apache.org/jira/browse/HADOOP-12756
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Reporter: shimingfei
> Assignee: shimingfei
> Attachments: OSS integration.pdf
>
>
> Aliyun OSS is widely used among China’s cloud users, but currently it is not
> easy to access data laid on OSS storage from user’s Hadoop/Spark application,
> because of no original support for OSS in Hadoop.
> This work aims to integrate Aliyun OSS with Hadoop. By simple configuration,
> Spark/Hadoop applications can read/write data from OSS without any code
> change. Narrowing the gap between user’s APP and data storage, like what have
> been done for S3 in Hadoop
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)