anthonyhungnguyen commented on issue #39492:
URL: https://github.com/apache/superset/issues/39492#issuecomment-4433463570

   Hey @mikebridge — really thorough SIP, thanks for putting this together. I 
wanted to share an alternative we've been running in production at Geotab for 
several months. It's adjacent to the YAML import/export reuse you rejected, but 
with a twist that I think changes the trade-off space enough to be worth 
folding into the discussion.
   
   **Short version:** we don't store snapshots in Superset's database at all. 
We push them to an external Git repo (GitLab in our case) and let Git be the 
version store.
   
   The flow is roughly: on save, we run the existing Export*Command to get the 
JSON bundle, then POST `repository/commits` to a configured repo. There's a 
default org-wide repo, but users can also point at their own via X-VC-Repo-Id / 
X-VC-Token request headers — we call that BYOR (Bring Your Own Repo). Snapshots 
land at `{type}s/{id}/{epoch_ms}-{user_id}.json`, which sidesteps the filename 
collision problem without any pre-allocation. Restore re-imports the JSON via 
the existing import_* v1 commands after auto-snapshotting the current state as 
a recovery point. Total surface area is ~650 LOC of Python and five endpoints 
(save, list, restore, preview, compare). Owners and roles are preserved on the 
live row across restores, for the same reason you exclude them in 
`__versioned__`.
   
   The reason I think it's worth raising rather than just being a footnote: it 
changes who owns the audit trail. With BYOR the history lives in Git — the 
system security and legal already watch — instead of a fourth place inside 
Superset's DB alongside backups and audit logs. That framing turned out to 
matter a lot for our compliance review, and I'd guess it would for other 
enterprise deployments too. The two approaches genuinely answer different 
questions: SIP-210 wins if version history needs to be a first-class queryable 
surface inside Superset (per-field diffs, ETag concurrency tokens, multi-entity 
transaction grouping). BYOR wins if the goal is "recovery path with zero 
operational surface area" and the audit trail belongs in Git.
   
   Honest weak spots: snapshot-grain history is genuinely worse for the "what 
changed" UI you're planning in V2, and we have nothing to offer the V3 locking 
SIP — the ETag backbone is a real advantage of your design. Storage cost on 
heavy-edit assets is also worse for us; we lean on Git's repo retention rather 
than being clever about deltas.
   
   Happy to open a draft PR with the BYOR code as a concrete reference if 
there's interest — not as a replacement for SIP-210 (the two solve overlapping 
but distinct problems) but so the trade-offs are something people can read 
rather than imagine. No worries if not.
   
   cc @rusackas 
   
   <img width="749" height="819" alt="Image" 
src="https://github.com/user-attachments/assets/0c347b60-ff08-4642-a3e2-1792066137f1";
 />
   
   <img width="1284" height="879" alt="Image" 
src="https://github.com/user-attachments/assets/1143c9ae-08ce-47e4-812b-136fb98fda7a";
 />
   
   <img width="599" height="577" alt="Image" 
src="https://github.com/user-attachments/assets/e17e5aaa-19d6-4051-a054-749dd83ab209";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to