YuriyKrasilnikov opened a new pull request, #37371:
URL: https://github.com/apache/superset/pull/37371
## Summary
Fixes #37061: Guest users in embedded dashboards can now sort table columns.
**Before:** Clicking any column header → `"Guest user cannot modify chart
payload"`
**After:** Sorting by visible columns works; hidden columns and SQL
injection still blocked.
---
## Problem Analysis
### Root Cause
`query_context_modified()` treated orderby identically to metrics/columns:
```python
for key, equivalent in [
("metrics", ["metrics"]),
("columns", ["columns", "groupby"]),
("orderby", ["orderby"]), # ← Same logic as metrics
]:
if not requested.issubset(stored):
return True # BLOCKED
```
**Issue:** User clicks column → frontend sends new orderby → `orderby ≠
stored_orderby` → blocked.
### Why This Is Wrong
| Field | Expected Behavior |
|-------|-------------------|
| metrics | Strict subset of stored (security) |
| columns | Strict subset of stored (security) |
| **orderby** | Can be ANY visible column (UX requirement) |
Sorting is **navigation**, not **data access expansion**.
---
## Security Research
Before implementing, we analyzed attack vectors:
### Attack Vector Analysis
| Vector | Risk | Example | Our Mitigation |
|--------|------|---------|----------------|
| **SQL Injection** | HIGH | `ORDER BY random()`, `sleep(5)` | Block
`expressionType: "SQL"` |
| **Data Exfiltration** | HIGH | `ORDER BY credit_card_number` reveals
ranking | Whitelist visible columns only |
| **RLS Bypass** | MEDIUM | Sort by filtered column reveals existence |
Whitelist prevents column probing |
| **DoS** | LOW | Expensive sort on unindexed column | Existing query
timeouts apply |
### Industry Standard
| Platform | Approach |
|----------|----------|
| Looker | Whitelist via LookML |
| Metabase | Visible fields only |
| Tableau | Pre-configured sorts |
| **Our solution** | Whitelist by visible columns |
### Mathematical Model
```
V = visible columns (columns ∪ groupby ∪ metrics)
S = requested sort columns
Rule: S ⊆ V (sort only by what user can see)
```
---
## Architecture Decision
### Why Not Just Remove orderby Check?
```python
# Option A: Remove orderby from loop (REJECTED)
for key in ["metrics", "columns", "groupby"]: # no orderby
```
**Rejected because:** Opens SQL injection and data exfiltration vectors.
### Why Not Frontend-Only Sorting?
**Rejected because:** Client-side sort on 1000 rows ≠ database sort on
100000 rows. Wrong results.
### Why Comparator Pattern?
During implementation, we discovered a **bug in existing code**:
```python
# EXISTING BUG (lines 260-264):
for key, equivalent in [...]:
for key in equivalent: # ← key OVERWRITTEN!
stored_values.update(...)
# After inner loop: key = last value of equivalent, not original key
```
This bug affects iterations 2-4 of the outer loop.
**Solution:** Refactor to explicit variables + custom comparator for orderby:
```python
for mapping in _FIELD_MAPPINGS:
field_name = mapping["field"] # explicit, not reused
equiv_fields = mapping["equivalent"]
comparator = mapping.get("comparator", default_compare)
for equiv_field in equiv_fields: # different variable name
...
```
### Why TypedDict?
```python
class FieldMapping(TypedDict):
field: str
equivalent: list[str]
comparator: NotRequired[Callable[...]]
```
1. **Type safety:** MyPy catches errors
2. **Extensibility:** Add new comparators without changing loop
3. **Documentation:** Structure is self-documenting
---
## Implementation Details
### New Functions
| Function | Purpose | Returns |
|----------|---------|---------|
| `_get_visible_columns(chart)` | Extract columns ∪ groupby ∪ metrics |
`set[str]` |
| `_extract_orderby_column_name(item)` | Normalize orderby format; None =
unsafe SQL | `str \| None` |
| `_orderby_whitelist_compare(ctx, chart, visible)` | Whitelist validation
for orderby | `bool` (True = block) |
| `_default_field_compare(...)` | Default subset comparison (extracted for
complexity) | `bool` |
### Why `_extract_orderby_column_name` Returns None for SQL?
```python
if orderby_item.get("expressionType") == "SQL":
return None # Triggers block
```
**Reason:** SQL expressions can contain:
- `random()` — non-deterministic, DoS potential
- `pg_sleep(5)` — timing attack
- `(SELECT password FROM users LIMIT 1)` — data exfiltration
### Why Check Both form_data AND queries?
```python
# Check form_data.orderby
for orderby_tuple in form_data.get("orderby") or []:
...
# Check query_context.queries[].orderby
for query in query_context.queries:
for orderby_tuple in getattr(query, "orderby", None) or []:
...
```
**Reason:** Attacker could send valid `form_data.orderby` but inject
malicious `query.orderby`. Both must be validated.
### Why `equiv_field` Instead of `key`?
**Reason:** Python loop variables leak into outer scope. Reusing `key`
corrupted subsequent iterations.
---
## Changes Summary
| File | Change |
|------|--------|
| `superset/security/manager.py` | +FieldMapping TypedDict, +4 functions,
refactor loop |
| `tests/unit_tests/security/manager_test.py` | +3 tests for whitelist
behavior |
### New Tests
| Test | Scenario | Expected |
|------|----------|----------|
| `test_..._visible_column_allowed` | Sort by column in chart | ✅ Allowed |
| `test_..._hidden_column_blocked` | Sort by `credit_card_number` | ❌
Blocked |
| `test_..._direction_change_allowed` | Change ASC↔DESC | ✅ Allowed |
Existing `test_query_context_modified_orderby` (SQL injection) continues to
pass.
---
## Backward Compatibility
| Aspect | Impact |
|--------|--------|
| API | No change |
| Behavior for non-guest users | No change |
| Behavior for guest users | **Sorting now works** |
| Existing tests | All pass (28/28) |
---
## How To Test
1. Create embedded dashboard with Table chart
2. Get guest token
3. Click column header to sort
4. **Before:** Error "Guest user cannot modify chart payload"
5. **After:** Table sorts correctly
---
## Checklist
- [x] Security research completed (4 attack vectors analyzed)
- [x] Architecture decision documented
- [x] Bug fix: variable shadowing in loop
- [x] Unit tests for whitelist behavior
- [x] Existing tests pass
- [x] MyPy types added (TypedDict, type hints)
- [x] ruff/pylint pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]