Great! -Nishith
On Thu, Sep 17, 2020 at 10:28 AM tanu dua <tanu.dua...@gmail.com> wrote: > Thank you so much Nisheth. I understand now how it’s going to work. > > On Wed, 16 Sep 2020 at 11:15 PM, nishith agarwal <n3.nas...@gmail.com> > wrote: > > > Tanu, > > > > > > > > I'm assuming you're talking about multiple kafka partitions from a single > > > > Spark Streaming job. In this case, your job can read from > > > > multiple partitions but at the end, this data should be written to a > single > > > > table. The dataset/rdd resulting from reading multiple partitions is > passed > > > > as a whole to the Hudi writer and spark parallelism takes care of > ensuring > > > > you don't lose the kafka partition parallelism. In this case, there are > no > > > > "multi-writers" to Hudi tables. Is your setup different from the one I > > > > described ? > > > > > > > > -Nishith > > > > > > > > On Wed, Sep 16, 2020 at 9:50 AM tanu dua <tanu.dua...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > I need to try myself more on this but how Hudi concurrent ingestion > works > > > > > with Spark Streaming. > > > > > We have multiple Kafka partitions on which Spark is listening on so > there > > > > > is a possibility that at any given point of time multiple executors > will > > be > > > > > reading the kafka partitions and start ingesting data. What is the > > > > > behaviour I can expect from Hudi. It’s possible that they may writing > to > > > > > the same Hudi partition. > > > > > > > > > > Would both writes be successful ? Would one overwrite another if both > > have > > > > > same primary key ? > > > > > > > > > >