bvaradar commented on issue #1582: URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-623022854
Thanks for the details. One of the primary contract within Hudi is the uniqueness of record key within partition/dataset. Instead, can you materialize the grouping within the record. To elaborate, can you create a nested array of struct field : "audit_log" (inner struct having same structure as top-level struct without audit_log) in your schema which would contain basically the list of record images at each ingest time and have your custom payload append all previous images as part of combineAndGetUpdateValue and preCombine. This way if you want the latest image, you simply have to skip projecting "audit_log" in your query and don't have to deal with reduce-by. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
