Re: [PROPOSAL] Debo Data Studio – A Unified Visual ETL Platform for the Hadoop Ecosystem

PJ Fanning Thu, 14 May 2026 06:48:55 -0700

Hi Surafel,
The ASF's core principal is Community Over Code. While your project
sounds interesting, becoming an ASF podling and later a project
requires a community. Key decisions and approvals for releases
requires at least 3 people to approve it.
I would suggest that you try building a community and possibly come
back to us in a few months.


Regards,
PJ

On Thu, 14 May 2026 at 14:23, Surafel Temesgen <[email protected]> wrote:
>
> Dear Apache Incubator Community,
>
> I’d like to propose Debo Data Studio for incubation and seek your feedback
> on the project and its fit within the Apache ecosystem.
>
> *The Problem*
> Working in the Hadoop world, I was struck by how fragmented the ETL tooling
> has become. Ingestion alone might involve Sqoop, Flume, Kafka, or NiFi;
> transformation could mean choosing between Hive, Pig, Spark, MapReduce, or
> Storm; and loading often brings HDFS, HBase, Hive (again), Kudu, or Sqoop
> export into the picture. Each tool carries its own configuration,
> dependencies, monitoring, and failure modes. Teams spend more effort
> integrating and managing a dozen specialised projects than actually
> transforming data.
>
> The usual answer — layering management platforms like Apache Ambari or
> Cloudera Manager — adds yet more complexity, and the fully-managed,
> enterprise-ready versions come with substantial licensing and support
> costs. The complexity is shifted, not eliminated.
>
> *The Idea*
> I believe the Hadoop ETL stack can be collapsed into a handful of
> well‑integrated tools. Debo Data Studio is an attempt to do exactly that: a
> single, visual, open‑source ETL environment that handles extraction,
> transformation, and loading without juggling multiple engines.
>
> Heavily inspired by Talend Open Studio, Debo Data Studio provides:
>
>    -
>
>    *Visual Pipeline Designer* – a drag‑and‑drop interface to build and
>    manage complete data flows.
>    -
>
>    *Broad Connectivity* – built‑in connectors for relational databases,
>    HDFS, cloud storage, APIs, CSV, JSON, Parquet, and more.
>    -
>
>    *Rich Transformation Library* – ready‑to‑use components for filtering,
>    joining, aggregating, mapping, and cleansing, removing the need to write
>    Hive, Pig, or Spark code for routine tasks.
>    -
>
>    *Execution Engine*
>    -
>
>    *Job Scheduling & Monitoring* – an integrated dashboard to schedule,
>    run, and monitor ETL jobs, addressing the operational headaches that Ambari
>    and similar tools try to solve externally.
>    -
>
>    *Open‑Source Core* – fully open codebase, avoiding proprietary lock‑in
>    and high licensing fees.
>
> The goal is that a team can adopt one consistent platform for ingestion,
> transformation, orchestration, and delivery — batch or streaming,
> structured or unstructured — and leave behind the patchwork of Sqoop, Hive,
> Spark, Oozie, and the rest.
>
> *Current Status*
> An initial working implementation is available at:
> https://github.com/Debo-et/Debo_data_studio
>
> The codebase is open and ready for community review.
>
> *Seeking Guidance*
> I would love to hear whether the Incubator sees value in a unified, visual
> ETL approach within the Hadoop and modern data ecosystem. I’m particularly
> interested in any challenges the project would need to overcome to become a
> genuine, production‑grade alternative to the fragmented stack, and whether
> it might be a good candidate for the Apache Incubator. Any feedback,
> suggestions, or constructive criticism are more than welcome.
>
> Thank you for your time and for considering this proposal. I’m looking
> forward to the discussion.
>
> regards,
>
> Surafel

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PROPOSAL] Debo Data Studio – A Unified Visual ETL Platform for the Hadoop Ecosystem

Reply via email to