[ 
https://issues.apache.org/jira/browse/HADOOP-19343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900476#comment-17900476
 ] 

Steve Loughran commented on HADOOP-19343:
-----------------------------------------

I'm not convinced. They've done things like input stream performance, 
IOStatistics, lots of testing, and lots of production deployment real world 
issues surface -especially on failure recovery from those failures of cloud 
infrastructure which are so rare you are unlikely to see them during 
development.

If you haven't written one of the connectors before, it is surprisingly hard to 
get everything right and performant.
Which means that it becomes an ongoing maintenance problem for years.

Hardest of all is resilience in the presence of failures.
ABFS has some odd issues when rename fails under throttling -look there.

Meanwhile, S3A code has so many fixes related to recovery from failures,
openssl issues and more.
A lot of those failures have come in during development testing,
Or worse: in production systems. And still they come: HADOOP-19221 being one 
example, HADOOP-19317 another.

Google GCS gets a lot more testing. And that in-the field deployment needed to 
find those obscure transient issues.

But I'm really reluctant to take it. In fact, I am really reluctant to take any 
more cloud connectors inside the AWS code because it sets an obligation for 
long-term testing and maintenance. It is trivial to add a downstream library to 
your releases, it is what maven POM files, gradle and SBT manifests.
* If you want it in Apache spark or other ASF products, submit pull requests to 
include it.
* If you want your own products. You can do the same or modify the 
hadoop-cloud-connectors POM to put it in.
* Except in the special case of "our build process does a clean build of the 
entire big data stack every night" you're not going to encounter loops.




> Add native support for GCS connector
> ------------------------------------
>
>                 Key: HADOOP-19343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19343
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.5.0
>            Reporter: Abhishek Modi
>            Priority: Major
>         Attachments: GCS connector for Hadoop.pdf
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to