sleapfish edited a comment on issue #2284:
URL: https://github.com/apache/hudi/issues/2284#issuecomment-789167720


   > 
   > 
   > IIUC SCD2 requires all versions of a given record be maintained inside the 
table? Hudi does allow you to keep history of changes to the table, upto a 
certain time in the past (configured via cleaner settings). If we never cleaned 
the table, then all changes from time 0 to now, will be available. I need to 
think through what exactly the problems would be if we did that. I can think of 
the file listing time grow over time, but then with 0.7.0+ we have the metadata 
table to alleviate that. Also if these dimension tables are typically much 
smaller, it may not be an issue per se.
   > 
   > The bones are there for this to work. we will have to spend sometime to 
fully declare a table table can store infinite amount of changes without every 
cleaning (i.e get rid of) older versions.
   
   @vinothchandar You don't necessarily need to keep older versions, since each 
change to a SCD 2 table will result in updating (upserting) the older version. 
Hence, it will now be part of the new commit, with updated effectiveTo and 
isActive fields.
   
   Whenever a change happens you will have:
   - 1 UPDATE (of an older version, but part of new commit - SET effectiveTo 
and isActive fields)
     - You can get this record by primary key + isActive = True (or effectiveTo 
= null)
   - 1 INSERT (new version with effectiveTo = null & isActive = True)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to