@Adarsh - FYI since you are the release manager for 32. On Wed, Jan 8, 2025 at 11:53 AM Abhishek Agarwal <abhis...@apache.org> wrote:
> I don't want to kick that can too far down the road either :) We don't > want to give a false hope that it's going to remain around forever. But yes > let's deprecate both Hadoop and Java 11 support in the upcoming 32 release. > It's unfortunate that Hadoop still doesn't support Java 17. We shouldn't > let it hold us back. Jetty, pac4j are dropping Java 11 support and we would > want to upgrade to newer versions of these dependencies soon. There are > also nice language features in Java 17 such as pattern matching, multiline > strings, and a lot more that we can't use if we have to be compile > compatible with Java 11. If you need the resource elasticity that Hadoop > provides or want to reuse shared infrastructure in the company, MM-less > ingestion is a good alternative. > > So let's deprecate it in 32. We can decide on removal later but hopefully, > it doesn't take too many releases to do that. > > On Tue, Jan 7, 2025 at 4:22 PM Karan Kumar <ka...@apache.org> wrote: > >> Okay from what I can gather few folks still need hadoop ingestion. So >> let's >> kick the can down the road regarding removal of that support but let's >> agree on the deprecation plan. Since druid 32 is around the corner let's >> atleast deprecated hadoop ingestion so that any new users are not >> onboarded >> to this way of ingestion. Deprecation also becomes a forcing function in >> internal company channel's for prioritization of getting off hadoop. >> >> How does this plan look? >> >> On Fri, Dec 13, 2024 at 1:11 AM Maytas Monsereenusorn <mayt...@apache.org >> > >> wrote: >> >> > We at Netflix are in a similar situation to Target Corporation (Lucas C >> > email above). >> > We currently rely on Hadoop ingestion for all our batch ingestion jobs. >> The >> > main reason for this is that we already have a large Hadoop cluster >> > supporting our Spark workloads that we can leverage for Druid >> ingestion. I >> > imagine that the closest alternative for us would be moving to K8 / >> > MiddleManager-less ingestion job. >> > >> > On Thu, Dec 12, 2024 at 10:56 PM Lucas Capistrant < >> > capistrant.lu...@gmail.com> wrote: >> > >> > > Apologies for the empty email… fat fingers. >> > > >> > > Just wanted to say that we at Target Corporation (USA), still rely >> > heavily >> > > on Hadoop ingest. We’d selfishly want support forever, but if forced >> to >> > > pivot to a new ingestion style for our larger batch ingest jobs that >> > > currently leverage the cheap compute on YARN, the longer the lead time >> > > between announcement by the community to the actual release with no >> > > support, the better. Making these types of changes can be a slow >> process >> > > for the slow to maneuver corporate cruise ship. >> > > >> > > On Thu, Dec 12, 2024 at 9:46 AM Lucas Capistrant < >> > > capistrant.lu...@gmail.com> >> > > wrote: >> > > >> > > > >> > > > >> > > > On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <ka...@apache.org> >> wrote: >> > > > >> > > >> +1 for removal of Hadoop based ingestion. It's a maintenance >> overhead >> > > and >> > > >> stops us from moving to java 17. >> > > >> I am not aware of any gaps in sql based ingestion which limits >> users >> > to >> > > >> move off from hadoop. If there are any, please feel free to reach >> out >> > > via >> > > >> slack/github. >> > > >> >> > > >> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <cwy...@apache.org> >> > wrote: >> > > >> >> > > >> > Hey everyone, >> > > >> > >> > > >> > It is about that time again to take a pulse on how commonly >> Hadoop >> > > >> > based ingestion is used with Druid in order to determine if we >> > should >> > > >> > keep supporting it or not going forward. >> > > >> > >> > > >> > In my view, Hadoop based ingestion has unofficially been on life >> > > >> > support for quite some time as we do not really go out of our >> way to >> > > >> > add new features to it, and we perform very minimal testing to >> > ensure >> > > >> > everything keeps working. The most recent changes to it I am >> aware >> > of >> > > >> > was to bump versions and require Hadoop 3, but that was primarily >> > > >> > motivated by selfish reasons of wanting to use its contained >> client >> > > >> > library and better isolation so that we could free up our own >> > > >> > dependencies to be updated. This thread is motivated by a similar >> > > >> > reason I guess, see the other thread I started recently >> discussing >> > > >> > dropping support for Java 11 where Hadoop does not yet support >> Java >> > 17 >> > > >> > runtime, and so the outcome of this discussion is involved in >> those >> > > >> > plans. >> > > >> > >> > > >> > I think SQL based ingestion with the multi-stage query engine is >> the >> > > >> > future of batch ingestion, and the Kubernetes based task runner >> > > >> > provides an alternative for task auto scaling capabilities. >> Because >> > of >> > > >> > this, I don't personally see a lot of compelling reasons to keep >> > > >> > supporting Hadoop, so I would be in favor of just dropping >> support >> > for >> > > >> > it completely, though I see no harm in keeping HDFS deep storage >> > > >> > around. In past discussions I think we had tied Hadoop removal to >> > > >> > adding something like Spark to replace it, but I wonder if this >> > still >> > > >> > needs to be the case. >> > > >> > >> > > >> > I do know that classically there have been quite a lot of large >> > Druid >> > > >> > clusters in the wild still relying on Hadoop in previous dev list >> > > >> > discussions about this topic, so I wanted to check to see if >> this is >> > > >> > still true and if so if any of these clusters have plans to >> > transition >> > > >> > to newer ways of ingesting data like SQL based ingestion. While >> > from a >> > > >> > dev/maintenance perspective it would be best to just drop it >> > > >> > completely, if there is still a large user base I think we need >> to >> > be >> > > >> > open to keeping it around for a while longer. If we do need to >> keep >> > > >> > it, maybe it would be worth it to invest some time in moving it >> > into a >> > > >> > contrib extension so that it isn't bundled by default with Druid >> > > >> > releases to discourage new adoption and more accurately reflect >> its >> > > >> > current status in Druid. >> > > >> > >> > > >> > >> > --------------------------------------------------------------------- >> > > >> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org >> > > >> > For additional commands, e-mail: dev-h...@druid.apache.org >> > > >> > >> > > >> > >> > > >> >> > > > >> > > >> > >> >