Re: [DISCUSS] Hadoop ingestion support

Karan Kumar Tue, 07 Jan 2025 02:52:40 -0800

Okay from what I can gather few folks still need hadoop ingestion. So let's
kick the can down the road regarding removal of that support but let's
agree on the deprecation plan. Since druid 32 is around the corner let's
atleast deprecated hadoop ingestion so that any new users are not onboarded
to this way of ingestion. Deprecation also becomes a forcing function in
internal company channel's for prioritization of getting off hadoop.


How does this plan look?

On Fri, Dec 13, 2024 at 1:11 AM Maytas Monsereenusorn <[email protected]>
wrote:

> We at Netflix are in a similar situation to Target Corporation (Lucas C
> email above).
> We currently rely on Hadoop ingestion for all our batch ingestion jobs. The
> main reason for this is that we already have a large Hadoop cluster
> supporting our Spark workloads that we can leverage for Druid ingestion. I
> imagine that the closest alternative for us would be moving to K8 /
> MiddleManager-less ingestion job.
>
> On Thu, Dec 12, 2024 at 10:56 PM Lucas Capistrant <
> [email protected]> wrote:
>
> > Apologies for the empty email… fat fingers.
> >
> > Just wanted to say that we at Target Corporation (USA), still rely
> heavily
> > on Hadoop ingest. We’d selfishly want support forever, but if forced to
> > pivot to a new ingestion style for our larger batch ingest jobs that
> > currently leverage the cheap compute on YARN, the longer the lead time
> > between announcement by the community to the actual release with no
> > support, the better. Making these types of changes can be a slow process
> > for the slow to maneuver corporate cruise ship.
> >
> > On Thu, Dec 12, 2024 at 9:46 AM Lucas Capistrant <
> > [email protected]>
> > wrote:
> >
> > >
> > >
> > > On Wed, Dec 11, 2024 at 9:10 PM Karan Kumar <[email protected]> wrote:
> > >
> > >> +1 for removal of Hadoop based ingestion. It's a maintenance overhead
> > and
> > >> stops us from moving to java 17.
> > >> I am not aware of any gaps in sql based ingestion which limits users
> to
> > >> move off from hadoop. If there are any, please feel free to reach out
> > via
> > >> slack/github.
> > >>
> > >> On Thu, Dec 12, 2024 at 3:22 AM Clint Wylie <[email protected]>
> wrote:
> > >>
> > >> > Hey everyone,
> > >> >
> > >> > It is about that time again to take a pulse on how commonly Hadoop
> > >> > based ingestion is used with Druid in order to determine if we
> should
> > >> > keep supporting it or not going forward.
> > >> >
> > >> > In my view, Hadoop based ingestion has unofficially been on life
> > >> > support for quite some time as we do not really go out of our way to
> > >> > add new features to it, and we perform very minimal testing to
> ensure
> > >> > everything keeps working. The most recent changes to it I am aware
> of
> > >> > was to bump versions and require Hadoop 3, but that was primarily
> > >> > motivated by selfish reasons of wanting to use its contained client
> > >> > library and better isolation so that we could free up our own
> > >> > dependencies to be updated. This thread is motivated by a similar
> > >> > reason I guess, see the other thread I started recently discussing
> > >> > dropping support for Java 11 where Hadoop does not yet support Java
> 17
> > >> > runtime, and so the outcome of this discussion is involved in those
> > >> > plans.
> > >> >
> > >> > I think SQL based ingestion with the multi-stage query engine is the
> > >> > future of batch ingestion, and the Kubernetes based task runner
> > >> > provides an alternative for task auto scaling capabilities. Because
> of
> > >> > this, I don't personally see a lot of compelling reasons to keep
> > >> > supporting Hadoop, so I would be in favor of just dropping support
> for
> > >> > it completely, though I see no harm in keeping HDFS deep storage
> > >> > around. In past discussions I think we had tied Hadoop removal to
> > >> > adding something like Spark to replace it, but I wonder if this
> still
> > >> > needs to be the case.
> > >> >
> > >> > I do know that classically there have been quite a lot of large
> Druid
> > >> > clusters in the wild still relying on Hadoop in previous dev list
> > >> > discussions about this topic, so I wanted to check to see if this is
> > >> > still true and if so if any of these clusters have plans to
> transition
> > >> > to newer ways of ingesting data like SQL based ingestion. While
> from a
> > >> > dev/maintenance perspective it would be best to just drop it
> > >> > completely, if there is still a large user base I think we need to
> be
> > >> > open to keeping it around for a while longer. If we do need to keep
> > >> > it, maybe it would be worth it to invest some time in moving it
> into a
> > >> > contrib extension so that it isn't bundled by default with Druid
> > >> > releases to discourage new adoption and more accurately reflect its
> > >> > current status in Druid.
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: [email protected]
> > >> > For additional commands, e-mail: [email protected]
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] Hadoop ingestion support

Reply via email to