betodealmeida opened a new pull request, #36245: URL: https://github.com/apache/superset/pull/36245
<!--- Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/ Example: fix(dashboard): load charts correctly --> ### SUMMARY <!--- Describe the change below, including rationale and design decisions --> This PR introduces a clean, type-safe protocol for data sources in Superset that can be "explored" (visualized). While the immediate implementation focuses on SQL datasources (dataserts and queries), the protocol lays the groundwork for first-class [semantic layer support](https://github.com/apache/superset/issues/35003). #### 🎯 Core Protocol (`superset/explorables/base.py`) Introduced the `Explorable` Protocol defining the interface for any data source that can be visualized in Superset: ```python @runtime_checkable class Explorable(Protocol): def get_query_result(self, query_object: QueryObject) -> QueryResult: ... def get_query_str(self, query_obj: QueryObjectDict) -> str: ... def has_drill_by_columns(self, column_names: list[str]) -> bool: ... @property def uid(self) -> str: ... @property def type(self) -> str: ... @property def columns(self) -> list[Any]: ... @property def cache_timeout(self) -> int | None: ... @property def is_rls_supported(self) -> bool: ... # ... other properties ``` **Key design decisions:** - **Minimal interface**: Only includes methods/properties needed for exploration - **No SQL-specific concepts**: `database`, `schema`, etc. are not in the protocol - **Runtime checkable**: Enables protocol checks without explicit inheritance - **Required `is_rls_supported`**: Makes RLS support explicit (can return `False`)i #### 🔧 QueryContext Integration Updated `QueryContext` to work with `Explorable` instead of just `BaseDatasource`: ```python # Before datasource: BaseDatasource # After datasource: Explorable ``` This enables polymorphic handling of different data source types while maintaining full backward compatibility. #### 📊 Drill-By Abstraction **Before**: Drill-by logic was SQL-specific and lived in the security manager ```python # Security manager had SQL queries like: drillable_columns = { row[0] for row in session.query(TableColumn.column_name) .filter(TableColumn.table_id == datasource_id) .filter(TableColumn.groupby).all() } ``` **After**: Drill-by is datasource-driven through the protocol ```python # Security manager (clean and datasource-agnostic) def has_drill_by_access(...): # ... and datasource.has_drill_by_columns(dimensions) # BaseDatasource implements the SQL-specific logic def has_drill_by_columns(self, column_names: list[str]) -> bool: drillable_columns = { row[0] for row in db.session.query(TableColumn.column_name) .filter(TableColumn.table_id == self.id) .filter(TableColumn.groupby).all() } return set(column_names).issubset(drillable_columns) ``` ### Benefits (even without semantic layers) #### ✅ Immediate Value 1. **Clearer boundaries**: SQL-specific code is now clearly scoped and doesn't leak into generic components 2. **Better maintainability**: Security manager is simpler and focuses on authorization logic 4. **Documentation**: The protocol serves as living documentation of what it means to be "explorable" 5. **Testability**: Easier to mock and test with a defined protocol interface #### ✅ Architecture Improvements 1. **Single responsibility**: Each component does one thing well 2. **Dependency inversion**: High-level modules (security, query context) depend on abstractions 3. **Open/closed principle**: New data source types (like semantic views) can be added without modifying existing code ### Future Semantic Layer Support When semantic layers are implemented, they'll simply implement the `Explorable` protocol: ```python class SemanticView: @property def uid(self) -> str: return f"semantic_view_{self.id}" def has_drill_by_columns(self, column_names: list[str]) -> bool: # Check against semantic layer dimension definitions return all(col in self.dimensions for col in column_names) @property def is_rls_supported(self) -> bool: return True # Semantic layer handles RLS internally # ... implement other protocol methods ``` **Zero changes needed** to: - QueryContext - Security manager - Chart data API - Any other component that works with `Explorable` ### Testing All existing tests pass with the new protocol. The test suite validates: - SQL datasource functionality remains unchanged - Type checking passes (`mypy`) - Security manager correctly handles both SQL and future non-SQL datasources - Drill-by logic works through the new abstraction ### Backward Compatibility This is a 100% backward-compatible refactor: - All existing SQL datasources work exactly as before - No API changes for charts, dashboards, or SQL Lab - No database migrations needed - No configuration changes required The protocol is purely additive - it formalizes what was already there, making it explicit and extensible. ### ADDITIONAL INFORMATION <!--- Check any relevant boxes with "x" --> <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue --> - [ ] Has associated issue: - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
