bibhu107 opened a new issue, #11213:
URL: https://github.com/apache/hudi/issues/11213

   Hi Community,
   
   I am seeking guidance on handling nested idempotent support for a 
large-scale data scenario involving contracts with third-party vendors for 
transferring items. Each contract (identified by contractId) has around 100,000 
items (identified by itemId), with a total of 6 million contracts per month for 
me, growing by 50% yearly. I want to use contractId as the hoodie_record_key 
and store the list of items as a nested field. All items within a contract 
share the same contract-related attributes. In the future, for a given 
contractId and items may be added, deleted, or updated for me, requiring me to 
fetch the item array and update necessary items. While I understand Hudi 
doesn't natively support deduplicating items in the array, I'm looking for a 
configuration-driven approach that might be useful for many projects. However, 
I acknowledge that updating nested fields could have performance implications 
for me as the number of items per contract grows. 
   
   Is it planned for hudi's future goals
   
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to