Re: [I] [SIP] Dataset Relationship Model — declare and manage relationships between datasets [superset]

via GitHub Sun, 28 Jun 2026 17:33:18 -0700


rusackas commented on issue #41496:
URL: https://github.com/apache/superset/issues/41496#issuecomment-4828017576


   Thanks @hjadm, there's a lot to like here, and a relationship graph is 
something folks have wanted for ages.
   
   I do have some concerns though, so here are my thoughts so far on all this:
   
   The cross-database merge engine is the big one. Pulling both sides into 
Superset and joining them in-process is a line we've pretty deliberately not 
crossed... it basically turns Superset into a federated query engine, with all 
the memory and scale questions that come with it. I'd be a lot more comfortable 
carving that out (your Open Question #1) and shipping the relationship model + 
canvas on their own first. The Superset Meta Database toys with this idea a 
bit, but flags all the risks that it's NOT intended for large datasets and will 
cause problems.
   
   Related, and maybe the part that worries me most: how do RLS and perms 
travel across a relationship? If I can see dataset A but not B, a declared A→B 
join is a fairly direct way to read B's rows. That feels like it needs 
designing in up front rather than a later security pass if and when people 
discover problems.
   
   I also want to be sure we're not building a second version of things we 
already have. Virtual datasets already express JOINs, and dashboards already 
have native filters with cross-filtering. How does this sit alongside those 
instead of competing with them? And on the model itself, I'm not clear how 
many-to-many  join types avoid metric fan-out / double-counting, which is 
usually where this gets gnarly.
   
   The bigger one: this overlaps a fair bit with SIP-182 (#35003). That one is 
basically trying to loosen Superset's coupling to the dataset model (the 
Explorable idea), and relationships are arguably a semantic-layer concern. 
External semantic layers define their own joins, so we'd risk two sources of 
truth. Hanging relationships off datasets pushes the opposite direction from 
where #35003 wants to go, so I don't think we want to consider them in 
isolation. Would love to see the two reconciled, or at least for this to say 
how it fits the Explorable direction. CC @betodealmeida / @michael-s-molina, 
who'll likely have opinions here.
   
   Last thing, more process than substance: this might be an easier yes if it 
were split into a smaller SIPs (model & UI, the cross-DB engine, the 
cross-filtering). Each is kind of its own thing, and bundling them means we 
can't approve the safe parts without taking on the risky ones too
   
   Thoughts? Happy to help navigate the dev@ side if it's useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [SIP] Dataset Relationship Model — declare and manage relationships between datasets [superset]

Reply via email to