[ 
https://issues.apache.org/jira/browse/HADOOP-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868640#comment-17868640
 ] 

Steve Loughran commented on HADOOP-19236:
-----------------------------------------

First, know that you don't have to actually be in our codebase to get picked 
up. As an example Google GCS is something broadly used yet it lives in its own 
project with Google doing most of the development (I do test it though). That 
means getting it into hadoop and waiting for our next release is not a blocker 
to your work. That is particularly important as it will not get reported to 
anything before a 3.4.x release.

This also means that you can implement the code along with the unit and 
integration tests without waiting for any PR to be merged into Hadoop. How far 
have you come along with this?

One issue for incorporating into the hadoop as long-term maintenance and 
testing. We do have people working on cos/oss maintenance and the reason an 
expectation that this would continue for tos.

Having had a quick look at the code I like the separation of file system API 
and actual implementation. We are slowly trying to retrofit that into the S3A 
code. ABFS a lot more well designed here.

I've had a quick look at your new dependency and awhile it is a licensed 
appropriately; you're going to have to cut out that okio class. There are also 
going to be problems with transitive dependencies, especially jackson.

Now the bad news: I cannot personally commit to doing any reviewing of this 
work, or testing. I'm sorry but I am behind with review PR related to S3A and 
ABFS and any commitment I make you will be unrealistic. It would be good if you 
could actually get support for anyone working in one of the other cloud connect 
modules to see if they would assist.

In the meantime
* start with that external repository with the implementation and test suites.
* get on the hadoop developer list and getting involved in discussions there 
-and especially testing forthcoming releases.
* reviewing changes to hadoop-common relevance to you is also important. I will 
highlight that the new bulk delete API designed for Iceberg compaction on cloud 
storage; vector IO can deliver significant speedups for Parquet and ORC. You 
can get familiar with this by reviewing other peoples PRs: 
https://issues.apache.org/jira/browse/HADOOP-19211 . Reviewing other peoples 
work is an essential part of the collaboration process, and a great way for 
everyone to become familiar with you and your work. 

> Integration of Volcano Engine TOS in Hadoop.
> --------------------------------------------
>
>                 Key: HADOOP-19236
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19236
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, tools
>            Reporter: Jinglun
>            Priority: Major
>         Attachments: Integration of Volcano Engine TOS in Hadoop.pdf
>
>
> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS 
> is the object storage service of Volcano Engine. A common way is to store 
> data into TOS and run Hadoop/Spark/Flink applications to access TOS. But 
> there is no original support for TOS in hadoop, thus it is not easy for users 
> to build their Big Data System based on TOS.
>  
> This work aims to integrate TOS with Hadoop to help users run their 
> applications on TOS. Users only need to do some simple configuration, then 
> their applications can read/write TOS without any code change. This work is 
> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object 
> Storage in Hadoop.
>  
>  Please see the attached document "Integration of Volcano Engine TOS in 
> Hadoop" for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to