nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238882651
oh, I thought, both jobs are running concurrently? is it not. can you throw some light on exact steps. is it. step1: start job1 in EMR cluster1. which consumes from source X and writes to hudi table Y step2: stop job1. its essentially a batch job. step3: start job2 in EMR cluster2 which again consumes from source X and writes to hudi table Y. now if you query hudi, you see duplicate data? is my understanding right ? also, can you share your write configs used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
