YuriyKrasilnikov opened a new pull request, #37395:
URL: https://github.com/apache/superset/pull/37395

   ## Summary
   
   Fixes #37359: Guest users in embedded dashboards can now use virtual 
datasets without SQL errors.
   
   **Before:** Guest RLS applied twice → SQL errors like `Unknown expression or 
function identifier 'my_table.tenant_id'`
   **After:** Guest RLS applied once → Virtual datasets work correctly
   
   ## Problem Analysis
   
   ### Root Cause (Verified by Code Trace)
   
   `get_sqla_row_level_filters()` includes guest RLS for **every** call, but 
for virtual datasets it's called **twice**:
   
   ```
   1. get_sqla_query() for virtual dataset
      ↓
   2. get_from_clause() → apply_rls() → get_predicates_for_table()
      ↓
   3. dataset.get_sqla_row_level_filters() ← INCLUDES GUEST RLS 
(models.py:758-763)
      ↓
   4. Guest RLS applied to underlying tables in virtual dataset SQL
      ↓
   5. Return to get_sqla_query()
      ↓
   6. self.get_sqla_row_level_filters() ← INCLUDES GUEST RLS AGAIN 
(helpers.py:3198)
      ↓
   7. Guest RLS applied to outer WHERE clause
      ↓
   8. DOUBLE APPLICATION!
   ```
   
   ### Key Code Locations
   
   | File | Lines | Description |
   |------|-------|-------------|
   | `superset/models/helpers.py` | 2052-2057 | `apply_rls()` call in 
`get_from_clause()` |
   | `superset/utils/rls.py` | 108 | `dataset.get_sqla_row_level_filters()` 
call |
   | `superset/connectors/sqla/models.py` | 758-763 | Guest RLS addition |
   | `superset/models/helpers.py` | 3198 | Outer query RLS call |
   
   ### Why This Happens
   
   Global guest RLS rules (without dataset ID) match **both**:
   - The underlying physical table dataset
   - The virtual dataset itself
   
   Both get guest RLS added → double application.
   
   ## Solution: Separate Method Pattern
   
   ### Architecture Decision
   
   Evaluated 3 options:
   
   | Option | Approach | Risk | Chosen |
   |--------|----------|------|--------|
   | A | Add parameter to public API | Breaks plugins, 36 test mocks | ❌ |
   | B | Tracking mechanism (like PR #35890) | Complex, many files | ❌ |
   | **C** | **Separate internal method** | **Safe, backwards compatible** | ✅ |
   
   ### Implementation
   
   1. **Extract internal method with `include_guest_rls` parameter:**
   ```python
   def get_sqla_row_level_filters(self, template_processor=None):
       """Public API - always includes guest RLS (backwards compatible)"""
       return self._get_sqla_row_level_filters_internal(
           template_processor, include_guest_rls=True
       )
   
   def _get_sqla_row_level_filters_internal(
       self, template_processor=None, include_guest_rls=True
   ):
       """Internal method with optional guest RLS exclusion"""
       # ... implementation with include_guest_rls check ...
   ```
   
   2. **Call internal method from `get_predicates_for_table()`:**
   ```python
   # Exclude guest RLS for underlying tables to prevent double application
   for predicate in dataset._get_sqla_row_level_filters_internal(
       include_guest_rls=False
   )
   ```
   
   ### Why Separate Method Pattern?
   
   | Risk | Separate Method | Parameter in Public API |
   |------|-----------------|------------------------|
   | Breaking plugins | ✅ None | ❌ Breaks |
   | Accidental RLS bypass | ✅ Protected | ❌ Possible |
   | API backwards compatibility | ✅ 100% | ⚠️ Partial |
   | Test mock updates | ✅ Minimal | ❌ 36 locations |
   
   ## Security Analysis
   
   | Aspect | Status |
   |--------|--------|
   | Regular RLS on underlying tables | ✅ Still applied |
   | Guest RLS on outer query | ✅ Still applied |
   | Guest RLS bypass possible | ❌ No - internal method protected |
   
   Mathematical model unchanged: `RLS(query) = RLS(underlying) AND RLS(outer)`
   
   ## Related Issues/PRs
   
   - Regression from: PR #36061 (fix: RLS in virtual datasets)
   - Similar fix pattern: PR #35890 (double time filter)
   - SQL Lab: Not affected (guest users cannot access SQL Lab)
   
   ## Testing
   
   ### New Tests (6 tests, all pass)
   - `test_public_api_includes_guest_rls` - Backwards compatibility
   - `test_internal_api_excludes_guest_rls_when_requested` - Core fix
   - `test_internal_api_includes_guest_rls_by_default` - Default behavior
   - `test_regular_rls_always_included` - Security
   - `test_guest_rls_skipped_when_feature_disabled` - Feature flag
   - `test_filter_grouping_preserved` - Edge case
   
   ### Files Changed
   - `superset/connectors/sqla/models.py` - Refactor method
   - `superset/utils/rls.py` - Call internal method
   - `tests/unit_tests/models/test_double_rls_virtual_dataset.py` - New tests
   
   ## How To Test Manually
   
   1. Create virtual dataset: `SELECT * FROM physical_table`
   2. Create dashboard with chart using virtual dataset
   3. Get guest token with global RLS rule: `{"clause": "tenant_id = 123"}`
   4. Open embedded dashboard
   5. **Before fix:** SQL error with alias mismatch
   6. **After fix:** Chart renders correctly
   
   ## Checklist
   
   - [x] Refactored to Separate Method pattern (safe, backwards compatible)
   - [x] 6 unit tests covering all scenarios
   - [x] All tests pass
   - [x] MyPy type checking passes
   - [x] No changes to public API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to