Re: [DISCUSS] Integration of Volcano Engine TOS in Hadoop.

Jinglun Fri, 14 Feb 2025 03:40:58 -0800

Thanks xiaoqiao and steve for your attention and comments. Let me answer the 
dependencies and tests.


**Dependencies**
Hadoop-tos involves a new dependency com.volcengine:ve-tos-java-sdk:2.8.6. It 
is an open source project with apache 2.0  license 
(https://github.com/volcengine/ve-tos-java-sdk/blob/main/LICENSE).

Here are the dependencies involved by com.volcengine:ve-tos-java-sdk:2.8.6.  
They (okhttp, okio, kotlin, jackson) are open source with apache 2.0 too. 
[INFO] +- com.volcengine:ve-tos-java-sdk:jar:2.8.7:compile
[INFO] |  +- com.squareup.okhttp3:okhttp:jar:4.10.0:compile
[INFO] |  |  +- com.squareup.okio:okio-jvm:jar:3.0.0:compile
[INFO] |  |  |  \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.20:test
[INFO] |  |  |     \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.20:test
[INFO] |  |  \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:compile
[INFO] |  |     \- org.jetbrains:annotations:jar:13.0:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile
[INFO] +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.6.20:compile

**How is it tested**
The hadoop-tos module has a complete unit test set, including the contracts and 
extended test cases. To run it, we need a machine that can connect to TOS. 
Setting the 6 environment variables below.
```
export TOS_ACCESS_KEY_ID={YOUR_ACCESS_KEY}
export TOS_SECRET_ACCESS_KEY={YOUR_SECRET_ACCESS_KEY}
export TOS_ENDPOINT={TOS_SERVICE_ENDPOINT}
export FILE_STORAGE_ROOT=/tmp/local_dev/
export TOS_BUCKET={YOUR_BUCKET_NAME}
export TOS_UNIT_TEST_ENABLED=true
```
Then cd to hadoop project root directory, and run the test command below.
```
mvn -Dtest=org.apache.hadoop.fs.tosfs.** test -pl org.apache.hadoop:hadoop-tos
```
I also test it in a real hadoop environment. The document (index.md) describes 
how to set jars and configure keys. Common tests include: shell commands, 
Terasort, DFSIO, NNBench, Distcp, etc. 

**Test Environment**
We need a VolcanoEngine account to run all the test cases. I can provide an 
environment for test. Please let me know if you need to test hadoop-tos 
(jing...@apache.org). 




On 2025/02/13 18:21:57 Steve Loughran wrote:
> Sounds good, though expect no commitment from me to review anything.
> My main concerns are about dependency libraries (what are they?) and
> testing.
> 
> On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He <hexiaoq...@apache.org> wrote:
> 
> > Thanks Jinglun for your work. Basically +1 from me to involve it into the
> > Hadoop codebase.
> > a. After a quick review of JIRA and PR, I think it is solid including
> > document and code style.
> > b. Contributors involved here are diverse who are from different projects
> > and companies, and active enough.
> > c. Community with Jinlun offline many times, and IMO he could be
> > responsible to review and test about this module.
> > Beside that, just suggest following the Hadoop guidelines[1] to develop
> > the new features.
> >
> > @Steve Loughran <ste...@cloudera.com> @Shilun Fan <slfan1...@foxmail.com> 
> > leave
> > some comments including some concerns in JIRA, would you mind giving more
> > suggestions for this discussion?
> > Thanks.
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> > [1] https://hadoop.apache.org/bylaws.html
> >
> >
> > On Sun, Jan 26, 2025 at 3:39 PM jinglun <jinglun...@qq.com.invalid> wrote:
> >
> >> Hello everyone, I'd like to discuss the integration of volcano engine tos
> >> in hadoop.
> >>
> >>
> >> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and
> >> TOS is the object storage service of Volcano Engine. A common way is to
> >> store data into TOS and run Hadoop/Spark/Flink applications to access TOS.
> >> But there is no original support for TOS in hadoop, thus it is not easy for
> >> users to build their Big Data System based on TOS.
> >> &nbsp;
> >> My proposal is to integrate TOS with Hadoop to help users run their
> >> applications on TOS. Users only need to do some simple configuration, then
> >> their applications can read/write TOS without any code change. This work is
> >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object
> >> Storage in Hadoop.
> >>
> >>
> >> More details could be found at&nbsp;
> >> https://issues.apache.org/jira/browse/HADOOP-19236.
> >>
> >>
> >> 1. What is the progress of the work now?
> >> The work is currently finished at branch HADOOP_19236. It is developed by
> >> the EMR team of Volcano Engine and served many users from both cloud and
> >> IDC for more than 2 years.
> >>
> >>
> >> 2. How is the&nbsp;long-term maintenance and testing guaranteed?&nbsp;
> >> The contributors are opensource friendly,&nbsp;including&nbsp;ZhengHu(PMC
> >> of HBase and Iceberg), Jinglun(Committer of Hadoop),&nbsp;SunXin(Committer
> >> of HBase),&nbsp;XianyinXin(Contributor of Spark), Rascal Wu(Contributor of
> >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan.&nbsp;We will all be
> >> involved in the long-term maintenance of this work.&nbsp;As time goes by,
> >> more people from the EMR team and the hadoop-tos users may join this work.
> >> So I'm confident at the long-term maintenance and testing.
> >>
> >>
> >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use an
> >> independent project?
> >> Integration is for a better user experience. First, users don't need to
> >> go to another repo to find the tos support. Second, users don't need to
> >> worry about the versions mapping between hadoop and hadoop-tos. Finally, a
> >> connector provided by hadoop community is&nbsp;more reliable and
> >> trustworthy.&nbsp;
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> If you have any question, concern or any thing else that is unclear,
> >> please let me know.&nbsp;Sincerely looking forward to your reply, thanks
> >> very much.
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Integration of Volcano Engine TOS in Hadoop.

Reply via email to