Re: [PROPOSAL] Debo Data Studio – A Unified Visual ETL Platform for the Hadoop Ecosystem

Surafel Temesgen Thu, 14 May 2026 23:33:19 -0700

Hi PJ,

Thank you for the feedback. I appreciate the emphasis on the "Community
Over Code" principle, as it is central to what makes the ASF successful.


However, I wanted to share a perspective on the current barrier to entry.
It often feels like the current structure is best suited for large
companies that can easily mobilize resources and hire people to build a
community from scratch. Over time, this has led to several Apache projects
becoming controlled by a few dominant corporations, which can make the
foundation feel less "open" to independent innovators.

I wish the ASF had a platform where anyone could post a proposal and
organically build a community around it directly within the Apache
ecosystem. This would level the playing field, allowing projects to grow
based on the merit of the idea and community interest rather than the
corporate backing behind them.

I will take your advice and focus on building the community independently
for now, and I hope to revisit this with you in a few months.

Best regards,

Surafel


On Thu, May 14, 2026 at 4:48 PM PJ Fanning <[email protected]> wrote:

> Hi Surafel,
> The ASF's core principal is Community Over Code. While your project
> sounds interesting, becoming an ASF podling and later a project
> requires a community. Key decisions and approvals for releases
> requires at least 3 people to approve it.
> I would suggest that you try building a community and possibly come
> back to us in a few months.
>
> Regards,
> PJ
>
> On Thu, 14 May 2026 at 14:23, Surafel Temesgen <[email protected]>
> wrote:
> >
> > Dear Apache Incubator Community,
> >
> > I’d like to propose Debo Data Studio for incubation and seek your
> feedback
> > on the project and its fit within the Apache ecosystem.
> >
> > *The Problem*
> > Working in the Hadoop world, I was struck by how fragmented the ETL
> tooling
> > has become. Ingestion alone might involve Sqoop, Flume, Kafka, or NiFi;
> > transformation could mean choosing between Hive, Pig, Spark, MapReduce,
> or
> > Storm; and loading often brings HDFS, HBase, Hive (again), Kudu, or Sqoop
> > export into the picture. Each tool carries its own configuration,
> > dependencies, monitoring, and failure modes. Teams spend more effort
> > integrating and managing a dozen specialised projects than actually
> > transforming data.
> >
> > The usual answer — layering management platforms like Apache Ambari or
> > Cloudera Manager — adds yet more complexity, and the fully-managed,
> > enterprise-ready versions come with substantial licensing and support
> > costs. The complexity is shifted, not eliminated.
> >
> > *The Idea*
> > I believe the Hadoop ETL stack can be collapsed into a handful of
> > well‑integrated tools. Debo Data Studio is an attempt to do exactly
> that: a
> > single, visual, open‑source ETL environment that handles extraction,
> > transformation, and loading without juggling multiple engines.
> >
> > Heavily inspired by Talend Open Studio, Debo Data Studio provides:
> >
> >    -
> >
> >    *Visual Pipeline Designer* – a drag‑and‑drop interface to build and
> >    manage complete data flows.
> >    -
> >
> >    *Broad Connectivity* – built‑in connectors for relational databases,
> >    HDFS, cloud storage, APIs, CSV, JSON, Parquet, and more.
> >    -
> >
> >    *Rich Transformation Library* – ready‑to‑use components for filtering,
> >    joining, aggregating, mapping, and cleansing, removing the need to
> write
> >    Hive, Pig, or Spark code for routine tasks.
> >    -
> >
> >    *Execution Engine*
> >    -
> >
> >    *Job Scheduling & Monitoring* – an integrated dashboard to schedule,
> >    run, and monitor ETL jobs, addressing the operational headaches that
> Ambari
> >    and similar tools try to solve externally.
> >    -
> >
> >    *Open‑Source Core* – fully open codebase, avoiding proprietary lock‑in
> >    and high licensing fees.
> >
> > The goal is that a team can adopt one consistent platform for ingestion,
> > transformation, orchestration, and delivery — batch or streaming,
> > structured or unstructured — and leave behind the patchwork of Sqoop,
> Hive,
> > Spark, Oozie, and the rest.
> >
> > *Current Status*
> > An initial working implementation is available at:
> > https://github.com/Debo-et/Debo_data_studio
> >
> > The codebase is open and ready for community review.
> >
> > *Seeking Guidance*
> > I would love to hear whether the Incubator sees value in a unified,
> visual
> > ETL approach within the Hadoop and modern data ecosystem. I’m
> particularly
> > interested in any challenges the project would need to overcome to
> become a
> > genuine, production‑grade alternative to the fragmented stack, and
> whether
> > it might be a good candidate for the Apache Incubator. Any feedback,
> > suggestions, or constructive criticism are more than welcome.
> >
> > Thank you for your time and for considering this proposal. I’m looking
> > forward to the discussion.
> >
> > regards,
> >
> > Surafel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [PROPOSAL] Debo Data Studio – A Unified Visual ETL Platform for the Hadoop Ecosystem

Reply via email to