john-bodley opened a new issue #8639: [SIP-26] Paranoid Deletes
URL: https://github.com/apache/incubator-superset/issues/8639
 
 
   ## [SIP] Proposal for Paranoid Deletes
   
   ### Motivation
   
   Currently at Airbnb we have a vast number of entities in Superset. Our 
deployment is approaching 200k  charts (both manually and procedurally 
generated), 10k dashboards, 80k registered tables (both physical and virtual), 
and 2.5k Druid datasources. 
   
   In a recent analysis of a specific Druid NoSQL (native) cluster, from a 
sample of ~ 5k charts only 34% of charts rendered, i.e., returned a 200 status 
code from the `/supserset/slice_json` route.  The following chart the 
renderability of charts as a function of last saved, which shows that a chart's 
viability often decays over time due to creep in the datasource metadata and 
the saved chart parameters.  
   
   ![Screen Shot 2019-11-22 at 5 50 14 
PM](https://user-images.githubusercontent.com/4567245/69471248-193b0880-0d52-11ea-920a-01bea6511628.png)
   
   Ideally we would like to have a mechanism to clean up obsolete resources 
(charts, dashboards, or datasources) in a somewhat paranoid manner, i.e., using 
soft deletes. This should help improve the perceived reliability and quality of 
Superset assets.  
   
   ### Proposed Change
   
   The proposed solution was originally mentioned by @etr2460 but I thought it 
was worthwhile formalizing this as a SIP. This borrows an idea from 
[Ruby](https://github.com/rubysherpas/paranoia) where we first soft delete 
records my marking them as deleted (with an associated timestamp) before 
performing a hard delete (deleting the record _n_-days later). Users could be 
prompted that their charts were being deleted and they can take action to 
undelete it if they take corrective action to fix the any errors or mark as 
non-deletable (or similar).
   
   There's actually a Python package 
[sqla-paranoid](https://github.com/jeanphix/sqla-paranoid) which brings 
transparent soft deletes to SQLAlchemy which we could use or replicate. The 
TL;DR is this would add a `deleted_at` (or `deleted_on` for consistency) column 
which would track soft deleted records. Records which are soft deleted wouldn't 
show up in the CRUD views by default unless the filter was enabled (not unlike 
how SQL Lab Views are ignored by default in the `tablemodelview`).   
   
   Records could be marked using a hook, trigger, or cron as deletable based on 
various criterion using cascading context:
   
   #### Charts 
   
   - Returns an error. 
   - Has not been viewed for _n_-days.
   
   #### Dashboards
   
   - Contains no charts. 
   
   ##### Tables/Datasources 
   
   - Not referenced by any charts.
   
   ### New or Changed Public Interfaces
   
   We would need to updated the data model and leverage `sql-paranoid` (or 
similar) for enabling the soft-deletes. We would also need to update the FAB 
views to handle filtering/exclusion of soft deleted records. Finally we would 
need to implement triggers or similar to i) soft delete records, and ii) hard 
delete records. 
   
   ### New dependencies
   
   The only new dependency would be `sqla-paranoid` (no public license) if we 
decided not to write this ourself. Note the package only contains several 
hundred lines of codes. 
   
   ### Migration Plan and Compatibility
   
   We would need to update the schema to include the `deleted_at` (or 
`deleted_on`) column for certain tables. Note I think we only need this for 
charts, dashboards, and datasources (the cascade deletes should handle the 
cleanup of columns and metrics). 
   
   ### Rejected Alternatives
   
   None. 
   
   to: @etr2460 @mistercrunch @villebro @willbarrett 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to