Tanu, I'm assuming you're talking about multiple kafka partitions from a single Spark Streaming job. In this case, your job can read from multiple partitions but at the end, this data should be written to a single table. The dataset/rdd resulting from reading multiple partitions is passed as a whole to the Hudi writer and spark parallelism takes care of ensuring you don't lose the kafka partition parallelism. In this case, there are no "multi-writers" to Hudi tables. Is your setup different from the one I described ?
-Nishith On Wed, Sep 16, 2020 at 9:50 AM tanu dua <tanu.dua...@gmail.com> wrote: > Hi, > I need to try myself more on this but how Hudi concurrent ingestion works > with Spark Streaming. > We have multiple Kafka partitions on which Spark is listening on so there > is a possibility that at any given point of time multiple executors will be > reading the kafka partitions and start ingesting data. What is the > behaviour I can expect from Hudi. It’s possible that they may writing to > the same Hudi partition. > > Would both writes be successful ? Would one overwrite another if both have > same primary key ? >