YannByron commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1120610841
@xushiyan I have one question about the design without CDF: If we don’t have CDF to record the changing, how can we get the changing info by joining of file slices after compaction or clustering which will replace file group? And, another key which i don't know whether you get is, when i wanna get the changing between Version M and Version N, i actually wanna to get the changes for each commit, that contains the changes of Version M, Version M+1, ..., Version N-1, Version N. I mean in this case, one join operation between M and N is not enough, we need to have ( N - M + 1 ) join operations. i don't think the CDF is more complex, and is a more clear one. It adds some work at write time (For cow, as @vinothchandar mentioned above, it also need to do), but greatly simplifies the query logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
