YuriyKrasilnikov opened a new pull request, #37371:
URL: https://github.com/apache/superset/pull/37371

   ## Summary
   
   Fixes #37061: Guest users in embedded dashboards can now sort table columns.
   
   **Before:** Clicking any column header → `"Guest user cannot modify chart 
payload"`
   **After:** Sorting by visible columns works; hidden columns and SQL 
injection still blocked.
   
   ---
   
   ## Problem Analysis
   
   ### Root Cause
   
   `query_context_modified()` treated orderby identically to metrics/columns:
   
   ```python
   for key, equivalent in [
       ("metrics", ["metrics"]),
       ("columns", ["columns", "groupby"]),
       ("orderby", ["orderby"]),  # ← Same logic as metrics
   ]:
       if not requested.issubset(stored):
           return True  # BLOCKED
   ```
   
   **Issue:** User clicks column → frontend sends new orderby → `orderby ≠ 
stored_orderby` → blocked.
   
   ### Why This Is Wrong
   
   | Field | Expected Behavior |
   |-------|-------------------|
   | metrics | Strict subset of stored (security) |
   | columns | Strict subset of stored (security) |
   | **orderby** | Can be ANY visible column (UX requirement) |
   
   Sorting is **navigation**, not **data access expansion**.
   
   ---
   
   ## Security Research
   
   Before implementing, we analyzed attack vectors:
   
   ### Attack Vector Analysis
   
   | Vector | Risk | Example | Our Mitigation |
   |--------|------|---------|----------------|
   | **SQL Injection** | HIGH | `ORDER BY random()`, `sleep(5)` | Block 
`expressionType: "SQL"` |
   | **Data Exfiltration** | HIGH | `ORDER BY credit_card_number` reveals 
ranking | Whitelist visible columns only |
   | **RLS Bypass** | MEDIUM | Sort by filtered column reveals existence | 
Whitelist prevents column probing |
   | **DoS** | LOW | Expensive sort on unindexed column | Existing query 
timeouts apply |
   
   ### Industry Standard
   
   | Platform | Approach |
   |----------|----------|
   | Looker | Whitelist via LookML |
   | Metabase | Visible fields only |
   | Tableau | Pre-configured sorts |
   | **Our solution** | Whitelist by visible columns |
   
   ### Mathematical Model
   
   ```
   V = visible columns (columns ∪ groupby ∪ metrics)
   S = requested sort columns
   
   Rule: S ⊆ V (sort only by what user can see)
   ```
   
   ---
   
   ## Architecture Decision
   
   ### Why Not Just Remove orderby Check?
   
   ```python
   # Option A: Remove orderby from loop (REJECTED)
   for key in ["metrics", "columns", "groupby"]:  # no orderby
   ```
   
   **Rejected because:** Opens SQL injection and data exfiltration vectors.
   
   ### Why Not Frontend-Only Sorting?
   
   **Rejected because:** Client-side sort on 1000 rows ≠ database sort on 
100000 rows. Wrong results.
   
   ### Why Comparator Pattern?
   
   During implementation, we discovered a **bug in existing code**:
   
   ```python
   # EXISTING BUG (lines 260-264):
   for key, equivalent in [...]:
       for key in equivalent:  # ← key OVERWRITTEN!
           stored_values.update(...)
   # After inner loop: key = last value of equivalent, not original key
   ```
   
   This bug affects iterations 2-4 of the outer loop.
   
   **Solution:** Refactor to explicit variables + custom comparator for orderby:
   
   ```python
   for mapping in _FIELD_MAPPINGS:
       field_name = mapping["field"]      # explicit, not reused
       equiv_fields = mapping["equivalent"]
       comparator = mapping.get("comparator", default_compare)
   
       for equiv_field in equiv_fields:   # different variable name
           ...
   ```
   
   ### Why TypedDict?
   
   ```python
   class FieldMapping(TypedDict):
       field: str
       equivalent: list[str]
       comparator: NotRequired[Callable[...]]
   ```
   
   1. **Type safety:** MyPy catches errors
   2. **Extensibility:** Add new comparators without changing loop
   3. **Documentation:** Structure is self-documenting
   
   ---
   
   ## Implementation Details
   
   ### New Functions
   
   | Function | Purpose | Returns |
   |----------|---------|---------|
   | `_get_visible_columns(chart)` | Extract columns ∪ groupby ∪ metrics | 
`set[str]` |
   | `_extract_orderby_column_name(item)` | Normalize orderby format; None = 
unsafe SQL | `str \| None` |
   | `_orderby_whitelist_compare(ctx, chart, visible)` | Whitelist validation 
for orderby | `bool` (True = block) |
   | `_default_field_compare(...)` | Default subset comparison (extracted for 
complexity) | `bool` |
   
   ### Why `_extract_orderby_column_name` Returns None for SQL?
   
   ```python
   if orderby_item.get("expressionType") == "SQL":
       return None  # Triggers block
   ```
   
   **Reason:** SQL expressions can contain:
   - `random()` — non-deterministic, DoS potential
   - `pg_sleep(5)` — timing attack
   - `(SELECT password FROM users LIMIT 1)` — data exfiltration
   
   ### Why Check Both form_data AND queries?
   
   ```python
   # Check form_data.orderby
   for orderby_tuple in form_data.get("orderby") or []:
       ...
   
   # Check query_context.queries[].orderby
   for query in query_context.queries:
       for orderby_tuple in getattr(query, "orderby", None) or []:
           ...
   ```
   
   **Reason:** Attacker could send valid `form_data.orderby` but inject 
malicious `query.orderby`. Both must be validated.
   
   ### Why `equiv_field` Instead of `key`?
   
   **Reason:** Python loop variables leak into outer scope. Reusing `key` 
corrupted subsequent iterations.
   
   ---
   
   ## Changes Summary
   
   | File | Change |
   |------|--------|
   | `superset/security/manager.py` | +FieldMapping TypedDict, +4 functions, 
refactor loop |
   | `tests/unit_tests/security/manager_test.py` | +3 tests for whitelist 
behavior |
   
   ### New Tests
   
   | Test | Scenario | Expected |
   |------|----------|----------|
   | `test_..._visible_column_allowed` | Sort by column in chart | ✅ Allowed |
   | `test_..._hidden_column_blocked` | Sort by `credit_card_number` | ❌ 
Blocked |
   | `test_..._direction_change_allowed` | Change ASC↔DESC | ✅ Allowed |
   
   Existing `test_query_context_modified_orderby` (SQL injection) continues to 
pass.
   
   ---
   
   ## Backward Compatibility
   
   | Aspect | Impact |
   |--------|--------|
   | API | No change |
   | Behavior for non-guest users | No change |
   | Behavior for guest users | **Sorting now works** |
   | Existing tests | All pass (28/28) |
   
   ---
   
   ## How To Test
   
   1. Create embedded dashboard with Table chart
   2. Get guest token
   3. Click column header to sort
   4. **Before:** Error "Guest user cannot modify chart payload"
   5. **After:** Table sorts correctly
   
   ---
   
   ## Checklist
   
   - [x] Security research completed (4 attack vectors analyzed)
   - [x] Architecture decision documented
   - [x] Bug fix: variable shadowing in loop
   - [x] Unit tests for whitelist behavior
   - [x] Existing tests pass
   - [x] MyPy types added (TypedDict, type hints)
   - [x] ruff/pylint pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to