Hey everyone,

It is about that time again to take a pulse on how commonly Hadoop
based ingestion is used with Druid in order to determine if we should
keep supporting it or not going forward.

In my view, Hadoop based ingestion has unofficially been on life
support for quite some time as we do not really go out of our way to
add new features to it, and we perform very minimal testing to ensure
everything keeps working. The most recent changes to it I am aware of
was to bump versions and require Hadoop 3, but that was primarily
motivated by selfish reasons of wanting to use its contained client
library and better isolation so that we could free up our own
dependencies to be updated. This thread is motivated by a similar
reason I guess, see the other thread I started recently discussing
dropping support for Java 11 where Hadoop does not yet support Java 17
runtime, and so the outcome of this discussion is involved in those
plans.

I think SQL based ingestion with the multi-stage query engine is the
future of batch ingestion, and the Kubernetes based task runner
provides an alternative for task auto scaling capabilities. Because of
this, I don't personally see a lot of compelling reasons to keep
supporting Hadoop, so I would be in favor of just dropping support for
it completely, though I see no harm in keeping HDFS deep storage
around. In past discussions I think we had tied Hadoop removal to
adding something like Spark to replace it, but I wonder if this still
needs to be the case.

I do know that classically there have been quite a lot of large Druid
clusters in the wild still relying on Hadoop in previous dev list
discussions about this topic, so I wanted to check to see if this is
still true and if so if any of these clusters have plans to transition
to newer ways of ingesting data like SQL based ingestion. While from a
dev/maintenance perspective it would be best to just drop it
completely, if there is still a large user base I think we need to be
open to keeping it around for a while longer. If we do need to keep
it, maybe it would be worth it to invest some time in moving it into a
contrib extension so that it isn't bundled by default with Druid
releases to discourage new adoption and more accurately reflect its
current status in Druid.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to