Hi Nick, I’m not exactly looking to meet a use case. I’m more looking to understand what the Hudi community’s plan and current recommendations are surrounding concurrency. Very interested to hear what others have done in this regard. Also, I think allowing multiple avro log files could help solve part of this problem as you describe, but we still run into the issue that Hudi only supports upserts to log file and not inserts.
-Brandon From: Semantic Beeng <[email protected]> Date: Tuesday, April 14, 2020 at 1:28 PM To: "[email protected]" <[email protected]>, "Scheller, Brandon" <[email protected]> Subject: RE: [EXTERNAL] Hudi concurrent writes CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Brandon, Can you please elaborate on your use case? Mine is about concurrent feature extraction processes that would need to write to the same target table and it could be addressed if Hudi allowed multiple MOR timelines https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=135860485 per table (now there is the concept of MOR table). With multiple/concurrent MOR timelines we could merge the timelines as units of work and get some form of logical concurrency. Hope this makes sense. Would this work for you? Please advise Nick On April 14, 2020 at 12:23 PM "Scheller, Brandon" < [email protected]<mailto:[email protected]>> wrote: Hi all, If I understand correctly, Hudi is not currently recommended for the concurrent writer use cases. I was wondering what the community’s official stance on concurrency is, and what the recommended workarounds/solutions are for Hudi to help prevent data corruption/duplication (For example we’ve heard of environments using an external table lock). Thanks, Brandon
