bibhu107 commented on issue #11213: URL: https://github.com/apache/hudi/issues/11213#issuecomment-2112055908
Thanks for providing your suggestions @ad1happy2go 1. Even right now we are doing groupingBy and collect_list this is failing when the array size is more than 2GB 2. As you can see all items has similar data as all belongs to a same contract, now denormalising item_ids that might lead to lots of duplicate data between two items or two data sets having common contract details. The reason I am approaching Hudi to solve this is because simple groupingBy and collect_list is not working, rather if we can smartly indexOut where is the item that needs to be updated that might be more useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
