Re: [DISCUSS] Hadoop ingestion support

Karan Kumar Wed, 11 Dec 2024 19:19:09 -0800

+1 for removal of Hadoop based ingestion. It's a maintenance overhead and
stops us from moving to java 17.
I am not aware of any gaps in sql based ingestion which limits users to
move off from hadoop. If there are any, please feel free to reach out via
slack/github.


On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <[email protected]> wrote:

> Hey everyone,
>
> It is about that time again to take a pulse on how commonly Hadoop
> based ingestion is used with Druid in order to determine if we should
> keep supporting it or not going forward.
>
> In my view, Hadoop based ingestion has unofficially been on life
> support for quite some time as we do not really go out of our way to
> add new features to it, and we perform very minimal testing to ensure
> everything keeps working. The most recent changes to it I am aware of
> was to bump versions and require Hadoop 3, but that was primarily
> motivated by selfish reasons of wanting to use its contained client
> library and better isolation so that we could free up our own
> dependencies to be updated. This thread is motivated by a similar
> reason I guess, see the other thread I started recently discussing
> dropping support for Java 11 where Hadoop does not yet support Java 17
> runtime, and so the outcome of this discussion is involved in those
> plans.
>
> I think SQL based ingestion with the multi-stage query engine is the
> future of batch ingestion, and the Kubernetes based task runner
> provides an alternative for task auto scaling capabilities. Because of
> this, I don't personally see a lot of compelling reasons to keep
> supporting Hadoop, so I would be in favor of just dropping support for
> it completely, though I see no harm in keeping HDFS deep storage
> around. In past discussions I think we had tied Hadoop removal to
> adding something like Spark to replace it, but I wonder if this still
> needs to be the case.
>
> I do know that classically there have been quite a lot of large Druid
> clusters in the wild still relying on Hadoop in previous dev list
> discussions about this topic, so I wanted to check to see if this is
> still true and if so if any of these clusters have plans to transition
> to newer ways of ingesting data like SQL based ingestion. While from a
> dev/maintenance perspective it would be best to just drop it
> completely, if there is still a large user base I think we need to be
> open to keeping it around for a while longer. If we do need to keep
> it, maybe it would be worth it to invest some time in moving it into a
> contrib extension so that it isn't bundled by default with Druid
> releases to discourage new adoption and more accurately reflect its
> current status in Druid.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Hadoop ingestion support

Reply via email to