thedeceptio opened a new pull request, #41279:
URL: https://github.com/apache/superset/pull/41279

   ## Summary
   
   Adds first-class support for **array-typed (multi-value) columns** in 
Explore, designed to be **dialect-agnostic** via an engine-spec capability 
layer. **ClickHouse** is the first concrete implementation; other array 
dialects can opt in by implementing three methods.
   
   Three operations are supported on array columns:
   - **Contains** — filter rows where the array includes a value → `has(col, v)`
   - **Length** — numeric dimension of element count → `length(col)`
   - **Explode** — group by individual elements → `arrayJoin(col)`
   
   ## Design
   
   - New semantic type `GenericDataType.MULTI_VALUE` (= 4), synced across the 
backend enum (`superset/utils/core.py`) and the frontend enum 
(`@apache-superset/core/common`).
   - Opt-in capability contract on `BaseEngineSpec`: a 
`supports_multivalue_columns` flag (default `False`, so existing engines are 
unaffected) plus `array_contains` / `array_length` / `array_explode`, which 
return **SQLAlchemy expressions** (proper binding/quoting per dialect — no raw 
SQL strings).
   - ClickHouse reclassifies `Array(...)` columns to `MULTI_VALUE` and 
implements the three methods.
   - Length/Explode are expressed as a small adhoc-column shape (`{column, 
columnOperation}`) resolved through the engine spec, so the same payload works 
on any array dialect.
   
   ## Validated against a live ClickHouse instance
   
   Tested end-to-end on a real table (22.8M rows, 11 array columns):
   - All `Array(UInt16|String|UInt32)` columns classify as `MULTI_VALUE`.
   - `has()` / `length()` / `arrayJoin()` render and execute correctly through 
the real ClickHouse dialect.
   - Confirmed the `arrayJoin` empty-array drop-out: ~42% of rows had empty 
arrays and correctly produce no rows under explode (documented as expected 
behavior).
   - Live validation also caught a bug (modifier columns wrongly flagged as 
"Columns missing in dataset" by query-context validation) — fixed and 
regression-tested.
   
   ## Tests
   
   - Backend (pytest): classification, per-capability SQL compilation, 
bound-parameter (injection-safety) checks, end-to-end query generation for 
Contains/Length/Explode, and guards (unsupported engine / unknown column / 
unknown operation / unimplemented dialect).
   - Frontend (jest): multi-value column icon, a `GenericDataType` enum-parity 
test, and operator-visibility (CONTAINS shown only for array columns).
   
   ## Scope / follow-ups (phase 2)
   
   - **Explode is ClickHouse-only.** Set-returning `UNNEST` dialects 
(PostgreSQL/Trino/BigQuery) require `CROSS JOIN UNNEST` plumbing in the query 
builder; they're guarded to raise a clear error instead of emitting invalid 
SQL, and are deferred.
   - **Length/Explode modifier UI** (controls that emit the `columnOperation` 
payload) is not yet built; the operations are usable via the query API today.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to