sleapfish commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-789167720
> > > IIUC SCD2 requires all versions of a given record be maintained inside the table? Hudi does allow you to keep history of changes to the table, upto a certain time in the past (configured via cleaner settings). If we never cleaned the table, then all changes from time 0 to now, will be available. I need to think through what exactly the problems would be if we did that. I can think of the file listing time grow over time, but then with 0.7.0+ we have the metadata table to alleviate that. Also if these dimension tables are typically much smaller, it may not be an issue per se. > > The bones are there for this to work. we will have to spend sometime to fully declare a table table can store infinite amount of changes without every cleaning (i.e get rid of) older versions. You don't necessarily need to keep older versions, since each change to a SCD 2 table will result in updating (upserting) the older version. Hence, it will now be part of the new commit, with updated effectiveTo and isActive fields. Whenever a change happens you will have: - 1 INSERT (new version with effectiveTo = null & isActive = True) - 1 UPDATE (of an older version, but part of new commit) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
