Hi Hadoop community,

I would like to start a discussion about adding Baidu Cloud BOS
(Baidu Object Storage) as a native Hadoop-compatible filesystem connector.

JIRA: https://issues.apache.org/jira/browse/HDFS-11161
PR: https://github.com/apache/hadoop/pull/8347
CI Status: +1 overall, all checks passed.

I have had some offline discussions with LuciferYang and the contributors
working on this connector. Based on those discussions, I am helping bring
this proposal to the Hadoop community for broader review and feedback.

The goal is to integrate BOS support as a native Hadoop filesystem module,
similar to the existing hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos
connectors.

1. Background

Baidu Cloud is one of the major cloud service providers in China. BOS
(Baidu Object Storage) is Baidu's core object storage service and is widely
used for big data analytics, machine learning, and data lake workloads.

A native Hadoop connector would allow Hadoop ecosystem projects, including
MapReduce, Spark, Hive, Flink, and others, to access BOS storage directly
through the bos:// scheme.

According to the contributors, this connector has been running in production
at Baidu for around 8 years, serving both BOS users and Baidu MapReduce
(BMR) workloads.

2. Implementation

The proposed module is placed under:

  hadoop-cloud-storage-project/hadoop-bos

This follows the structure of the existing cloud storage connectors.

The implementation includes:

- A full Hadoop FileSystem implementation with the bos:// URI scheme
- Pluggable credentials provider support
- Contract tests covering standard filesystem operations
- Dependency shading or exclusion to avoid classpath conflicts, with shaded
  dependencies placed under org.apache.hadoop.fs.bos.shaded.*

3. Long-term Maintenance

The following contributors have expressed commitment to maintaining this
module:

- yangdong2398, BOS R&D
- LuciferYang, Apache Spark PMC
- jackylee-ch, Apache Gluten PMC
- houzhizhen, Apache HugeGraph committer
- summaryzb, Apache Uniffle committer

They have committed to:

- Responding to issues and PRs within one week
- Keeping dependencies up to date
- Adapting the connector to future Hadoop API changes

4. Why Consider Integrating This into Hadoop

This proposal follows a similar rationale to hadoop-aws (S3A),
hadoop-aliyun, and hadoop-cos:

- Users can rely on a single, consistent Hadoop distribution without
  managing separate connector JARs and version compatibility manually
- A connector maintained within the Hadoop community is easier for users to
  trust and review
- Shared CI helps ensure ongoing compatibility with Hadoop trunk

I would like to invite feedback from the community on whether this connector
is appropriate to include in Hadoop, and what additional work, review, or
requirements would be needed before it can be accepted.

The contributors are copied / expected to participate in this discussion and
can provide more details about the implementation, production usage, and
maintenance plan.

Best regards,
Shilun Fan.

Reply via email to