RajasekarSribalan edited a comment on issue #2214: URL: https://github.com/apache/hudi/issues/2214#issuecomment-719155536
Thanks Balaji for quick response. Pls find my answer below. Do you have hoodie.combine.before.upsert set to true ? We don't set this flag , so it should be true by default. You can also check if the duplicates have the same _hoodie_commit_time value to see if this is the pattern ? Yes they have the same _hoodie_commit_time ,same parquet files ,same hoodie record key and different commit seq no for each duplicate entry. It is also possible that you have more than one writer ingesting data to the same dataset concurrently. This will not work as expected. We have one hudi pipeline for one table and I suppose hudi doesn't support concurrent writes/upserts. We consume messages from kafka ,transform and then upsert in hudi.So I am still.unable to get you regarding ingesting same dataset concurrently.Can you provide some information on this scenario? Thanks, Raj On Fri, Oct 30, 2020, 3:04 AM Balaji Varadarajan <[email protected]> wrote: > @RajasekarSribalan <https://github.com/RajasekarSribalan> : Do you have > hoodie.combine.before.upsert set to true ? By default, this is true, so > unless you have set to false, this should not be a problem ? You can also > check if the duplicates have the same _hoodie_commit_time value to see if > this is the pattern ? > > Another question, when you say duplicate record - Do they have same > _hoodie_record_key value ? > > It is also possible that you have more than one writer ingesting data to > the same dataset concurrently. This will not work as expected. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/hudi/issues/2214#issuecomment-719037420>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFMO6I44KNN2RQMYAIBVCJLSNHNVTANCNFSM4TDVGEYA> > . > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
