rusackas commented on issue #41496: URL: https://github.com/apache/superset/issues/41496#issuecomment-4828017576
Thanks @hjadm, there's a lot to like here, and a relationship graph is something folks have wanted for ages. I do have some concerns though, so here are my thoughts so far on all this: The cross-database merge engine is the big one. Pulling both sides into Superset and joining them in-process is a line we've pretty deliberately not crossed... it basically turns Superset into a federated query engine, with all the memory and scale questions that come with it. I'd be a lot more comfortable carving that out (your Open Question #1) and shipping the relationship model + canvas on their own first. The Superset Meta Database toys with this idea a bit, but flags all the risks that it's NOT intended for large datasets and will cause problems. Related, and maybe the part that worries me most: how do RLS and perms travel across a relationship? If I can see dataset A but not B, a declared A→B join is a fairly direct way to read B's rows. That feels like it needs designing in up front rather than a later security pass if and when people discover problems. I also want to be sure we're not building a second version of things we already have. Virtual datasets already express JOINs, and dashboards already have native filters with cross-filtering. How does this sit alongside those instead of competing with them? And on the model itself, I'm not clear how many-to-many join types avoid metric fan-out / double-counting, which is usually where this gets gnarly. The bigger one: this overlaps a fair bit with SIP-182 (#35003). That one is basically trying to loosen Superset's coupling to the dataset model (the Explorable idea), and relationships are arguably a semantic-layer concern. External semantic layers define their own joins, so we'd risk two sources of truth. Hanging relationships off datasets pushes the opposite direction from where #35003 wants to go, so I don't think we want to consider them in isolation. Would love to see the two reconciled, or at least for this to say how it fits the Explorable direction. CC @betodealmeida / @michael-s-molina, who'll likely have opinions here. Last thing, more process than substance: this might be an easier yes if it were split into a smaller SIPs (model & UI, the cross-DB engine, the cross-filtering). Each is kind of its own thing, and bundling them means we can't approve the safe parts without taking on the risky ones too Thoughts? Happy to help navigate the dev@ side if it's useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
