Re: Hudi Concurrent Ingestion with Spark Streaming

tanu dua Thu, 17 Sep 2020 10:28:14 -0700

Thank you so much Nisheth. I understand now how it’s going to work.

On Wed, 16 Sep 2020 at 11:15 PM, nishith agarwal <n3.nas...@gmail.com>
wrote:


> Tanu,
>
>
>
> I'm assuming you're talking about multiple kafka partitions from a single
>
> Spark Streaming job. In this case, your job can read from
>
> multiple partitions but at the end, this data should be written to a single
>
> table. The dataset/rdd resulting from reading multiple partitions is passed
>
> as a whole to the Hudi writer and spark parallelism takes care of ensuring
>
> you don't lose the kafka partition parallelism. In this case, there are no
>
> "multi-writers" to Hudi tables. Is your setup different from the one I
>
> described ?
>
>
>
> -Nishith
>
>
>
> On Wed, Sep 16, 2020 at 9:50 AM tanu dua <tanu.dua...@gmail.com> wrote:
>
>
>
> > Hi,
>
> > I need to try myself more on this but how Hudi concurrent ingestion works
>
> > with Spark Streaming.
>
> > We have multiple Kafka partitions on which Spark is listening on so there
>
> > is a possibility that at any given point of time multiple executors will
> be
>
> > reading the kafka partitions and start ingesting data. What is the
>
> > behaviour I can expect from Hudi. It’s possible that they may writing to
>
> > the same Hudi partition.
>
> >
>
> > Would both writes be successful ? Would one overwrite another if both
> have
>
> > same primary key ?
>
> >
>
>

Re: Hudi Concurrent Ingestion with Spark Streaming

Reply via email to