Haohui & Chris, Sounds great, thank you very much! We'll cut a JIRA once we get everything lined up.
Best, James On Mon, Dec 7, 2015 at 3:54 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote: > Hi James, > > This sounds great! Thank you for considering contributing the code. > > Just seconding what Haohui said, there is existing precedent for > alternative implementations of the Hadoop FileSystem in our codebase. We > currently have similar plugins for S3 [1], Azure [2] and OpenStack Swift > [3]. Additionally, we have a suite of FileSystem contract tests [4]. > These tests are designed to help developers of alternative file systems > assess how closely they match the semantics expected by Hadoop ecosystem > components. > > Many Hadoop users are accustomed to using HDFS instead of these > alternative file systems, so none of the alternatives are on the default > Hadoop classpath immediately after deployment. Instead, the code for each > one is in a separate module under the "hadoop-tools" directory in the > source tree. Users who need to use the alternative file systems take > extra steps post-deployment to add them to the classpath where necessary. > This achieves the dependency isolation needed. For example, users who > never use the Azure plugin won't accidentally pick up a transitive > dependency on the Azure SDK jar. > > I recommend taking a quick glance through the existing modules for S3, > Azure and OpenStack. We'll likely ask that a new FileSystem > implementation follow the same patterns if feasible for consistency. This > would include things like using the contract tests, having a provision to > execute tests both offline/mocked and live/integrated with the real > service and providing a documentation page that explains configuration for > end users. > > For now, please feel free to file a HADOOP JIRA with your proposal. We > can work out the details of all of this in discussion on that JIRA. > > Something else to follow up on will be licensing concerns. I see the > project already uses the Apache license, but it appears to be an existing > body of code initially developed at Google. That might require a Software > Grant Agreement [5]. Again, this is something that can be hashed out in > discussion on the JIRA after you create it. > > [1] > http://hadoop.apache.org/docs/r2.7.1/hadoop-aws/tools/hadoop-aws/index.html > [2] http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html > [3] http://hadoop.apache.org/docs/r2.7.1/hadoop-openstack/index.html > [4] > http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/file > system/testing.html > [5] http://www.apache.org/licenses/ > > --Chris Nauroth > > > > > On 12/7/15, 3:10 PM, "Haohui Mai" <ricet...@gmail.com> wrote: > > >Hi, > > > >Thanks for reaching out. It would be great to see this in the Hadoop > >ecosystem. > > > >In Hadoop we have AWS S3 support. IMO they address similar use cases > >thus I think that it should be relatively straightforward to adopt the > >code. > > > >The only catch in my head right now is to properly isolate dependency. > >Not only the code needs to be put into a separate module, but many > >Hadoop applications also depend on different versions of Guava. I > >think it might be a problem that needs some attentions at the very > >beginning. > > > >Please feel free to reach out if you have any other questions. > > > >Regards, > >Haohui > > > > > >On Mon, Dec 7, 2015 at 2:35 PM, James Malone > ><jamesmal...@google.com.invalid> wrote: > >> Hello, > >> > >> We're from a team within Google Cloud Platform focused on OSS and data > >> technologies, especially Hadoop (and Spark.) Before we cut a JIRA for > >> something we¹d like to do, we wanted to reach out to this list to ask a > >>two > >> quick questions, describe our proposed action, and check for any major > >> objections. > >> > >> Proposed action: > >> We have a Hadoop connector[1] (more info[2]) for Google Cloud Storage > >>(GCS) > >> which we have been building and maintaining for some time. After we > >>clean > >> up our code and tests to conform (to these[3] and other requirements) we > >> would like to contribute it to Hadoop. We have many customers using the > >> connector in high-throughput production Hadoop clusters; we¹d like to > >>make > >> it easier and faster to use Hadoop and GCS. > >> > >> Timeline: > >> Presently, we are working on the beta of Google Cloud Dataproc[4] which > >> limits our time a bit, so we¹re targeting late Q1 2016 for creating a > >>JIRA > >> issue and adapting our connector code as needed. > >> > >> Our (quick) questions: > >> * Do we need to take any (non-coding) action for this beyond submitting > >>a > >> JIRA when we are ready? > >> * Are there any up-front concerns or questions which we can (or will > >>need > >> to) address? > >> > >> Thank you! > >> > >> James Malone > >> On behalf of the Google Big Data OSS Engineering Team > >> > >> Links: > >> [1] - > >>https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs > >> [2] - https://cloud.google.com/hadoop/google-cloud-storage-connector > >> [3] - > >>https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs > >> [4] - https://cloud.google.com/dataproc > > > >