Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

slfan1989 Sun, 07 Jun 2026 02:22:42 -0700

Hi Xiaoqiao,

Thank you for your support and for sharing your thoughts.


I agree that the implementation in PR #8347 is ready from a technical
perspective, and it's also encouraging that both contributors have
extensive experience in Apache communities.

As you suggested, it would be great to hear more feedback from the dev
community before moving forward with the proposed plan.

Additional perspectives will help us build broader consensus and ensure we
make the best decision for the project.

Thanks again for your review and support.

Best regards,
Shilun Fan.

On Tue, May 26, 2026 at 2:41 PM Xiaoqiao He <[email protected]> wrote:

> Thanks Shilun for driving this progress.
> +1 from my side,
> a. From the PR (https://github.com/apache/hadoop/pull/8347), the code has
> been ready now.
> b. Both of the contributors are PMC members or committers from mature
> community of apache.
> I would like to hear more sound from the dev team about the following
> plan. Good
> Luck!
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, May 22, 2026 at 9:33 PM slfan1989 <[email protected]> wrote:
>
> > Hi Hadoop community,
> >
> > I would like to start a discussion about adding Baidu Cloud BOS
> > (Baidu Object Storage) as a native Hadoop-compatible filesystem
> connector.
> >
> > JIRA: https://issues.apache.org/jira/browse/HDFS-11161
> > PR: https://github.com/apache/hadoop/pull/8347
> > CI Status: +1 overall, all checks passed.
> >
> > I have had some offline discussions with LuciferYang and the contributors
> > working on this connector. Based on those discussions, I am helping bring
> > this proposal to the Hadoop community for broader review and feedback.
> >
> > The goal is to integrate BOS support as a native Hadoop filesystem
> module,
> > similar to the existing hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos
> > connectors.
> >
> > 1. Background
> >
> > Baidu Cloud is one of the major cloud service providers in China. BOS
> > (Baidu Object Storage) is Baidu's core object storage service and is
> widely
> > used for big data analytics, machine learning, and data lake workloads.
> >
> > A native Hadoop connector would allow Hadoop ecosystem projects,
> including
> > MapReduce, Spark, Hive, Flink, and others, to access BOS storage directly
> > through the bos:// scheme.
> >
> > According to the contributors, this connector has been running in
> > production
> > at Baidu for around 8 years, serving both BOS users and Baidu MapReduce
> > (BMR) workloads.
> >
> > 2. Implementation
> >
> > The proposed module is placed under:
> >
> >   hadoop-cloud-storage-project/hadoop-bos
> >
> > This follows the structure of the existing cloud storage connectors.
> >
> > The implementation includes:
> >
> > - A full Hadoop FileSystem implementation with the bos:// URI scheme
> > - Pluggable credentials provider support
> > - Contract tests covering standard filesystem operations
> > - Dependency shading or exclusion to avoid classpath conflicts, with
> shaded
> >   dependencies placed under org.apache.hadoop.fs.bos.shaded.*
> >
> > 3. Long-term Maintenance
> >
> > The following contributors have expressed commitment to maintaining this
> > module:
> >
> > - yangdong2398, BOS R&D
> > - LuciferYang, Apache Spark PMC
> > - jackylee-ch, Apache Gluten PMC
> > - houzhizhen, Apache HugeGraph committer
> > - summaryzb, Apache Uniffle committer
> >
> > They have committed to:
> >
> > - Responding to issues and PRs within one week
> > - Keeping dependencies up to date
> > - Adapting the connector to future Hadoop API changes
> >
> > 4. Why Consider Integrating This into Hadoop
> >
> > This proposal follows a similar rationale to hadoop-aws (S3A),
> > hadoop-aliyun, and hadoop-cos:
> >
> > - Users can rely on a single, consistent Hadoop distribution without
> >   managing separate connector JARs and version compatibility manually
> > - A connector maintained within the Hadoop community is easier for users
> to
> >   trust and review
> > - Shared CI helps ensure ongoing compatibility with Hadoop trunk
> >
> > I would like to invite feedback from the community on whether this
> > connector
> > is appropriate to include in Hadoop, and what additional work, review, or
> > requirements would be needed before it can be accepted.
> >
> > The contributors are copied / expected to participate in this discussion
> > and
> > can provide more details about the implementation, production usage, and
> > maintenance plan.
> >
> > Best regards,
> > Shilun Fan.
> >
>

Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

Reply via email to