Hi Surafel, The ASF's core principal is Community Over Code. While your project sounds interesting, becoming an ASF podling and later a project requires a community. Key decisions and approvals for releases requires at least 3 people to approve it. I would suggest that you try building a community and possibly come back to us in a few months.
Regards, PJ On Thu, 14 May 2026 at 14:23, Surafel Temesgen <[email protected]> wrote: > > Dear Apache Incubator Community, > > I’d like to propose Debo Data Studio for incubation and seek your feedback > on the project and its fit within the Apache ecosystem. > > *The Problem* > Working in the Hadoop world, I was struck by how fragmented the ETL tooling > has become. Ingestion alone might involve Sqoop, Flume, Kafka, or NiFi; > transformation could mean choosing between Hive, Pig, Spark, MapReduce, or > Storm; and loading often brings HDFS, HBase, Hive (again), Kudu, or Sqoop > export into the picture. Each tool carries its own configuration, > dependencies, monitoring, and failure modes. Teams spend more effort > integrating and managing a dozen specialised projects than actually > transforming data. > > The usual answer — layering management platforms like Apache Ambari or > Cloudera Manager — adds yet more complexity, and the fully-managed, > enterprise-ready versions come with substantial licensing and support > costs. The complexity is shifted, not eliminated. > > *The Idea* > I believe the Hadoop ETL stack can be collapsed into a handful of > well‑integrated tools. Debo Data Studio is an attempt to do exactly that: a > single, visual, open‑source ETL environment that handles extraction, > transformation, and loading without juggling multiple engines. > > Heavily inspired by Talend Open Studio, Debo Data Studio provides: > > - > > *Visual Pipeline Designer* – a drag‑and‑drop interface to build and > manage complete data flows. > - > > *Broad Connectivity* – built‑in connectors for relational databases, > HDFS, cloud storage, APIs, CSV, JSON, Parquet, and more. > - > > *Rich Transformation Library* – ready‑to‑use components for filtering, > joining, aggregating, mapping, and cleansing, removing the need to write > Hive, Pig, or Spark code for routine tasks. > - > > *Execution Engine* > - > > *Job Scheduling & Monitoring* – an integrated dashboard to schedule, > run, and monitor ETL jobs, addressing the operational headaches that Ambari > and similar tools try to solve externally. > - > > *Open‑Source Core* – fully open codebase, avoiding proprietary lock‑in > and high licensing fees. > > The goal is that a team can adopt one consistent platform for ingestion, > transformation, orchestration, and delivery — batch or streaming, > structured or unstructured — and leave behind the patchwork of Sqoop, Hive, > Spark, Oozie, and the rest. > > *Current Status* > An initial working implementation is available at: > https://github.com/Debo-et/Debo_data_studio > > The codebase is open and ready for community review. > > *Seeking Guidance* > I would love to hear whether the Incubator sees value in a unified, visual > ETL approach within the Hadoop and modern data ecosystem. I’m particularly > interested in any challenges the project would need to overcome to become a > genuine, production‑grade alternative to the fragmented stack, and whether > it might be a good candidate for the Apache Incubator. Any feedback, > suggestions, or constructive criticism are more than welcome. > > Thank you for your time and for considering this proposal. I’m looking > forward to the discussion. > > regards, > > Surafel --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
