ad1happy2go commented on issue #11213: URL: https://github.com/apache/hudi/issues/11213#issuecomment-2109419958
@bibhu107 Why can't be achieve this with current functionality? You can preprocess your data frame doing something like groupBy and collect_list and then save to hudi. You can further implement your custom payload to do whatever you want to achieve merging list (Previous and current) Although as each contract id has 100,000 items, if we create nested structure then single record payload itself will be too huge and performance will be very bad. Not sure if that big list even JVM will be able to accommodate and it can fail. Why can't have denormalised structure with record key as contract_id and item_id. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
