[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

GitBox Tue, 06 Sep 2022 21:05:55 -0700


nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238882651


   oh, I thought, both jobs are running concurrently? is it not. can you throw 
some light on exact steps. 
   is it. 
   step1: start job1 in EMR cluster1. which consumes from source X and writes 
to hudi table Y
   step2: stop job1. its essentially a batch job.
   step3: start job2 in EMR cluster2 which again consumes from source X and 
writes to hudi table Y. 
   now if you query hudi, you see duplicate data? 
   
   is my understanding right ? 
   
   also, can you share your write configs used. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

Reply via email to