Hi Nick,

I’m not exactly looking to meet a use case. I’m more looking to understand what 
the Hudi community’s plan and current recommendations are surrounding 
concurrency. Very interested to hear what others have done in this regard. 
Also, I think allowing multiple avro log files could help solve part of this 
problem as you describe, but we still run into the issue that Hudi only 
supports upserts to log file and not inserts.

-Brandon

From: Semantic Beeng <[email protected]>
Date: Tuesday, April 14, 2020 at 1:28 PM
To: "[email protected]" <[email protected]>, "Scheller, Brandon" 
<[email protected]>
Subject: RE: [EXTERNAL] Hudi concurrent writes


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Brandon,

Can you please elaborate on your use case?

Mine is about concurrent feature extraction processes that would need to write 
to the same target table and it could be addressed if Hudi allowed multiple MOR 
timelines  
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=135860485 per 
table (now there is the concept of MOR table).

With multiple/concurrent MOR timelines we could merge the timelines as units of 
work and get some form of logical concurrency.

Hope this makes sense.
Would this work for you?

Please advise
Nick
On April 14, 2020 at 12:23 PM "Scheller, Brandon" < 
[email protected]<mailto:[email protected]>> wrote:


Hi all,

If I understand correctly, Hudi is not currently recommended for the concurrent 
writer use cases. I was wondering what the community’s official stance on 
concurrency is, and what the recommended workarounds/solutions are for Hudi to 
help prevent data corruption/duplication (For example we’ve heard of 
environments using an external table lock).

Thanks,
Brandon

Reply via email to