bvaradar commented on issue #1979: URL: https://github.com/apache/hudi/issues/1979#issuecomment-679522323
@hughfdjackson : In general getting incremental read to discard duplicates is not possible for MOR table types as we defer the merging of records to compaction. I was thinking about alternate ways to achieve your use-case for COW table by using an application level boolean flag. Let me know if this makes sense: 1. Introduce additional boolean column "changed". Default Value is false. 2. Have your own implementation of HoodieRecordPayload plugged-in. 3a In HoodieRecordPayload.getInsertValue(), return an avro record with changed = true. This function is called first time when the new record is inserted. 3(b) In HoodieRecordPayload.combineAndGetUpdateValue(), if you determine, there is no material change, set changed = false else set it to true. In your incremental query, add the filter changed = true to filter out those without material changes ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
