Re: [DISCUSS] Hadoop ingestion support

Lucas Capistrant Thu, 12 Dec 2024 07:46:59 -0800

On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <[email protected]> wrote:


> +1 for removal of Hadoop based ingestion. It's a maintenance overhead and
> stops us from moving to java 17.
> I am not aware of any gaps in sql based ingestion which limits users to
> move off from hadoop. If there are any, please feel free to reach out via
> slack/github.
>
> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <[email protected]> wrote:
>
> > Hey everyone,
> >
> > It is about that time again to take a pulse on how commonly Hadoop
> > based ingestion is used with Druid in order to determine if we should
> > keep supporting it or not going forward.
> >
> > In my view, Hadoop based ingestion has unofficially been on life
> > support for quite some time as we do not really go out of our way to
> > add new features to it, and we perform very minimal testing to ensure
> > everything keeps working. The most recent changes to it I am aware of
> > was to bump versions and require Hadoop 3, but that was primarily
> > motivated by selfish reasons of wanting to use its contained client
> > library and better isolation so that we could free up our own
> > dependencies to be updated. This thread is motivated by a similar
> > reason I guess, see the other thread I started recently discussing
> > dropping support for Java 11 where Hadoop does not yet support Java 17
> > runtime, and so the outcome of this discussion is involved in those
> > plans.
> >
> > I think SQL based ingestion with the multi-stage query engine is the
> > future of batch ingestion, and the Kubernetes based task runner
> > provides an alternative for task auto scaling capabilities. Because of
> > this, I don't personally see a lot of compelling reasons to keep
> > supporting Hadoop, so I would be in favor of just dropping support for
> > it completely, though I see no harm in keeping HDFS deep storage
> > around. In past discussions I think we had tied Hadoop removal to
> > adding something like Spark to replace it, but I wonder if this still
> > needs to be the case.
> >
> > I do know that classically there have been quite a lot of large Druid
> > clusters in the wild still relying on Hadoop in previous dev list
> > discussions about this topic, so I wanted to check to see if this is
> > still true and if so if any of these clusters have plans to transition
> > to newer ways of ingesting data like SQL based ingestion. While from a
> > dev/maintenance perspective it would be best to just drop it
> > completely, if there is still a large user base I think we need to be
> > open to keeping it around for a while longer. If we do need to keep
> > it, maybe it would be worth it to invest some time in moving it into a
> > contrib extension so that it isn't bundled by default with Druid
> > releases to discourage new adoption and more accurately reflect its
> > current status in Druid.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [DISCUSS] Hadoop ingestion support

Reply via email to