bibhu107 opened a new issue, #11213: URL: https://github.com/apache/hudi/issues/11213
Hi Community, I am seeking guidance on handling nested idempotent support for a large-scale data scenario involving contracts with third-party vendors for transferring items. Each contract (identified by contractId) has around 100,000 items (identified by itemId), with a total of 6 million contracts per month for me, growing by 50% yearly. I want to use contractId as the hoodie_record_key and store the list of items as a nested field. All items within a contract share the same contract-related attributes. In the future, for a given contractId and items may be added, deleted, or updated for me, requiring me to fetch the item array and update necessary items. While I understand Hudi doesn't natively support deduplicating items in the array, I'm looking for a configuration-driven approach that might be useful for many projects. However, I acknowledge that updating nested fields could have performance implications for me as the number of items per contract grows. Is it planned for hudi's future goals Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
