thedeceptio opened a new pull request, #41279:
URL: https://github.com/apache/superset/pull/41279
## Summary
Adds first-class support for **array-typed (multi-value) columns** in
Explore, designed to be **dialect-agnostic** via an engine-spec capability
layer. **ClickHouse** is the first concrete implementation; other array
dialects can opt in by implementing three methods.
Three operations are supported on array columns:
- **Contains** — filter rows where the array includes a value → `has(col, v)`
- **Length** — numeric dimension of element count → `length(col)`
- **Explode** — group by individual elements → `arrayJoin(col)`
## Design
- New semantic type `GenericDataType.MULTI_VALUE` (= 4), synced across the
backend enum (`superset/utils/core.py`) and the frontend enum
(`@apache-superset/core/common`).
- Opt-in capability contract on `BaseEngineSpec`: a
`supports_multivalue_columns` flag (default `False`, so existing engines are
unaffected) plus `array_contains` / `array_length` / `array_explode`, which
return **SQLAlchemy expressions** (proper binding/quoting per dialect — no raw
SQL strings).
- ClickHouse reclassifies `Array(...)` columns to `MULTI_VALUE` and
implements the three methods.
- Length/Explode are expressed as a small adhoc-column shape (`{column,
columnOperation}`) resolved through the engine spec, so the same payload works
on any array dialect.
## Validated against a live ClickHouse instance
Tested end-to-end on a real table (22.8M rows, 11 array columns):
- All `Array(UInt16|String|UInt32)` columns classify as `MULTI_VALUE`.
- `has()` / `length()` / `arrayJoin()` render and execute correctly through
the real ClickHouse dialect.
- Confirmed the `arrayJoin` empty-array drop-out: ~42% of rows had empty
arrays and correctly produce no rows under explode (documented as expected
behavior).
- Live validation also caught a bug (modifier columns wrongly flagged as
"Columns missing in dataset" by query-context validation) — fixed and
regression-tested.
## Tests
- Backend (pytest): classification, per-capability SQL compilation,
bound-parameter (injection-safety) checks, end-to-end query generation for
Contains/Length/Explode, and guards (unsupported engine / unknown column /
unknown operation / unimplemented dialect).
- Frontend (jest): multi-value column icon, a `GenericDataType` enum-parity
test, and operator-visibility (CONTAINS shown only for array columns).
## Scope / follow-ups (phase 2)
- **Explode is ClickHouse-only.** Set-returning `UNNEST` dialects
(PostgreSQL/Trino/BigQuery) require `CROSS JOIN UNNEST` plumbing in the query
builder; they're guarded to raise a clear error instead of emitting invalid
SQL, and are deferred.
- **Length/Explode modifier UI** (controls that emit the `columnOperation`
payload) is not yet built; the operations are usable via the query API today.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]