Thanks xiaoqiao and steve for your attention and comments. Let me answer the dependencies and tests.
**Dependencies** Hadoop-tos involves a new dependency com.volcengine:ve-tos-java-sdk:2.8.6. It is an open source project with apache 2.0 license (https://github.com/volcengine/ve-tos-java-sdk/blob/main/LICENSE). Here are the dependencies involved by com.volcengine:ve-tos-java-sdk:2.8.6. They (okhttp, okio, kotlin, jackson) are open source with apache 2.0 too. [INFO] +- com.volcengine:ve-tos-java-sdk:jar:2.8.7:compile [INFO] | +- com.squareup.okhttp3:okhttp:jar:4.10.0:compile [INFO] | | +- com.squareup.okio:okio-jvm:jar:3.0.0:compile [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.20:test [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.20:test [INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:compile [INFO] | | \- org.jetbrains:annotations:jar:13.0:compile [INFO] | \- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile [INFO] +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.6.20:compile **How is it tested** The hadoop-tos module has a complete unit test set, including the contracts and extended test cases. To run it, we need a machine that can connect to TOS. Setting the 6 environment variables below. ``` export TOS_ACCESS_KEY_ID={YOUR_ACCESS_KEY} export TOS_SECRET_ACCESS_KEY={YOUR_SECRET_ACCESS_KEY} export TOS_ENDPOINT={TOS_SERVICE_ENDPOINT} export FILE_STORAGE_ROOT=/tmp/local_dev/ export TOS_BUCKET={YOUR_BUCKET_NAME} export TOS_UNIT_TEST_ENABLED=true ``` Then cd to hadoop project root directory, and run the test command below. ``` mvn -Dtest=org.apache.hadoop.fs.tosfs.** test -pl org.apache.hadoop:hadoop-tos ``` I also test it in a real hadoop environment. The document (index.md) describes how to set jars and configure keys. Common tests include: shell commands, Terasort, DFSIO, NNBench, Distcp, etc. **Test Environment** We need a VolcanoEngine account to run all the test cases. I can provide an environment for test. Please let me know if you need to test hadoop-tos (jing...@apache.org). On 2025/02/13 18:21:57 Steve Loughran wrote: > Sounds good, though expect no commitment from me to review anything. > My main concerns are about dependency libraries (what are they?) and > testing. > > On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He <hexiaoq...@apache.org> wrote: > > > Thanks Jinglun for your work. Basically +1 from me to involve it into the > > Hadoop codebase. > > a. After a quick review of JIRA and PR, I think it is solid including > > document and code style. > > b. Contributors involved here are diverse who are from different projects > > and companies, and active enough. > > c. Community with Jinlun offline many times, and IMO he could be > > responsible to review and test about this module. > > Beside that, just suggest following the Hadoop guidelines[1] to develop > > the new features. > > > > @Steve Loughran <ste...@cloudera.com> @Shilun Fan <slfan1...@foxmail.com> > > leave > > some comments including some concerns in JIRA, would you mind giving more > > suggestions for this discussion? > > Thanks. > > > > Best Regards, > > - He Xiaoqiao > > > > [1] https://hadoop.apache.org/bylaws.html > > > > > > On Sun, Jan 26, 2025 at 3:39 PM jinglun <jinglun...@qq.com.invalid> wrote: > > > >> Hello everyone, I'd like to discuss the integration of volcano engine tos > >> in hadoop. > >> > >> > >> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and > >> TOS is the object storage service of Volcano Engine. A common way is to > >> store data into TOS and run Hadoop/Spark/Flink applications to access TOS. > >> But there is no original support for TOS in hadoop, thus it is not easy for > >> users to build their Big Data System based on TOS. > >> > >> My proposal is to integrate TOS with Hadoop to help users run their > >> applications on TOS. Users only need to do some simple configuration, then > >> their applications can read/write TOS without any code change. This work is > >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object > >> Storage in Hadoop. > >> > >> > >> More details could be found at > >> https://issues.apache.org/jira/browse/HADOOP-19236. > >> > >> > >> 1. What is the progress of the work now? > >> The work is currently finished at branch HADOOP_19236. It is developed by > >> the EMR team of Volcano Engine and served many users from both cloud and > >> IDC for more than 2 years. > >> > >> > >> 2. How is the long-term maintenance and testing guaranteed? > >> The contributors are opensource friendly, including ZhengHu(PMC > >> of HBase and Iceberg), Jinglun(Committer of Hadoop), SunXin(Committer > >> of HBase), XianyinXin(Contributor of Spark), Rascal Wu(Contributor of > >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan. We will all be > >> involved in the long-term maintenance of this work. As time goes by, > >> more people from the EMR team and the hadoop-tos users may join this work. > >> So I'm confident at the long-term maintenance and testing. > >> > >> > >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use an > >> independent project? > >> Integration is for a better user experience. First, users don't need to > >> go to another repo to find the tos support. Second, users don't need to > >> worry about the versions mapping between hadoop and hadoop-tos. Finally, a > >> connector provided by hadoop community is more reliable and > >> trustworthy. > >> > >> > >> > >> > >> > >> > >> > >> > >> If you have any question, concern or any thing else that is unclear, > >> please let me know. Sincerely looking forward to your reply, thanks > >> very much. > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org