anthonyhungnguyen commented on issue #39492:
URL: https://github.com/apache/superset/issues/39492#issuecomment-4433463570
Hey @mikebridge — really thorough SIP, thanks for putting this together. I
wanted to share an alternative we've been running in production at Geotab for
several months. It's adjacent to the YAML import/export reuse you rejected, but
with a twist that I think changes the trade-off space enough to be worth
folding into the discussion.
**Short version:** we don't store snapshots in Superset's database at all.
We push them to an external Git repo (GitLab in our case) and let Git be the
version store.
The flow is roughly: on save, we run the existing Export*Command to get the
JSON bundle, then POST `repository/commits` to a configured repo. There's a
default org-wide repo, but users can also point at their own via X-VC-Repo-Id /
X-VC-Token request headers — we call that BYOR (Bring Your Own Repo). Snapshots
land at `{type}s/{id}/{epoch_ms}-{user_id}.json`, which sidesteps the filename
collision problem without any pre-allocation. Restore re-imports the JSON via
the existing import_* v1 commands after auto-snapshotting the current state as
a recovery point. Total surface area is ~650 LOC of Python and five endpoints
(save, list, restore, preview, compare). Owners and roles are preserved on the
live row across restores, for the same reason you exclude them in
`__versioned__`.
The reason I think it's worth raising rather than just being a footnote: it
changes who owns the audit trail. With BYOR the history lives in Git — the
system security and legal already watch — instead of a fourth place inside
Superset's DB alongside backups and audit logs. That framing turned out to
matter a lot for our compliance review, and I'd guess it would for other
enterprise deployments too. The two approaches genuinely answer different
questions: SIP-210 wins if version history needs to be a first-class queryable
surface inside Superset (per-field diffs, ETag concurrency tokens, multi-entity
transaction grouping). BYOR wins if the goal is "recovery path with zero
operational surface area" and the audit trail belongs in Git.
Honest weak spots: snapshot-grain history is genuinely worse for the "what
changed" UI you're planning in V2, and we have nothing to offer the V3 locking
SIP — the ETag backbone is a real advantage of your design. Storage cost on
heavy-edit assets is also worse for us; we lean on Git's repo retention rather
than being clever about deltas.
Happy to open a draft PR with the BYOR code as a concrete reference if
there's interest — not as a replacement for SIP-210 (the two solve overlapping
but distinct problems) but so the trade-offs are something people can read
rather than imagine. No worries if not.
cc @rusackas
<img width="749" height="819" alt="Image"
src="https://github.com/user-attachments/assets/0c347b60-ff08-4642-a3e2-1792066137f1"
/>
<img width="1284" height="879" alt="Image"
src="https://github.com/user-attachments/assets/1143c9ae-08ce-47e4-812b-136fb98fda7a"
/>
<img width="599" height="577" alt="Image"
src="https://github.com/user-attachments/assets/e17e5aaa-19d6-4051-a054-749dd83ab209"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]