Hey everyone, It is about that time again to take a pulse on how commonly Hadoop based ingestion is used with Druid in order to determine if we should keep supporting it or not going forward.
In my view, Hadoop based ingestion has unofficially been on life support for quite some time as we do not really go out of our way to add new features to it, and we perform very minimal testing to ensure everything keeps working. The most recent changes to it I am aware of was to bump versions and require Hadoop 3, but that was primarily motivated by selfish reasons of wanting to use its contained client library and better isolation so that we could free up our own dependencies to be updated. This thread is motivated by a similar reason I guess, see the other thread I started recently discussing dropping support for Java 11 where Hadoop does not yet support Java 17 runtime, and so the outcome of this discussion is involved in those plans. I think SQL based ingestion with the multi-stage query engine is the future of batch ingestion, and the Kubernetes based task runner provides an alternative for task auto scaling capabilities. Because of this, I don't personally see a lot of compelling reasons to keep supporting Hadoop, so I would be in favor of just dropping support for it completely, though I see no harm in keeping HDFS deep storage around. In past discussions I think we had tied Hadoop removal to adding something like Spark to replace it, but I wonder if this still needs to be the case. I do know that classically there have been quite a lot of large Druid clusters in the wild still relying on Hadoop in previous dev list discussions about this topic, so I wanted to check to see if this is still true and if so if any of these clusters have plans to transition to newer ways of ingesting data like SQL based ingestion. While from a dev/maintenance perspective it would be best to just drop it completely, if there is still a large user base I think we need to be open to keeping it around for a while longer. If we do need to keep it, maybe it would be worth it to invest some time in moving it into a contrib extension so that it isn't bundled by default with Druid releases to discourage new adoption and more accurately reflect its current status in Druid. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org