Hi everyone,
I'd like to start a discussion on a storage optimization for `MAP<STRING, T>`
columns targeting time-series, IoT, observability, and similar scenarios.
### Problem
In these workloads, data is **"globally heterogeneous but locally
homogeneous"** — the global key union across all rows can reach tens of
thousands, but each row only carries 5~50 keys that are highly repetitive
within groups (e.g., the same reportor always reports `{usage, load, iowait}`).
Current options all fall short:
- **Default MAP storage** (KV arrays): no per-key predicate pushdown, no
per-key column pruning, no per-key statistics.
- **VARIANT**: unshredded fields (>90% in these scenarios) fall into a binary
blob, losing all columnar advantages.
- **Wide table**: flattening 50,000+ fields into columns results in >99% NULL,
with metadata explosion and unbounded schema churn.
### Proposed Solution: Columnar-Extend
We propose an **opt-in storage optimization** for MAP columns — enabled via a
table option:
```sql
CREATE TABLE metrics (
ts TIMESTAMP,
metric STRING,
ext-map MAP<STRING, DOUBLE>
) WITH (
'ext-map.map-storage-layout' = 'extend',
'ext-map.columnar-extend.num-columns' = '16'
);
```
The key idea: instead of storing MAP entries as KV arrays, physically rewrite
them into a **Struct with `K` typed reusable columns** plus a lightweight
`__field_mapping`. This gives every key full columnar treatment — predicate
pushdown, column pruning, native statistics — while keeping the column count
bounded at `K` (tens, not tens of thousands). Rows exceeding `K` keys spill
into a small overflow map, so correctness never depends on `K` being large
enough. `K` adapts across files based on the actual data width.
The logical type stays `MAP<STRING, T>` — the optimization is transparent to
users. Existing queries like `ext-map['usage'] > 30` work unchanged; the engine
translates them into physical sub-column predicates internally.
### PIP Document
The full proposal — including physical layout, query path, public interface
changes, and rejected alternatives — is available here:
https://cwiki.apache.org/confluence/display/PAIMON/PIP-43%3A+Columnar-Extend+Storage+Optimization+for+MAP+Type+in+Paimon
### Looking for Feedback
I'd appreciate community feedback on:
1. The overall approach — e.g., column count exceeding K with `__overflow` vs.
other strategies.
2. The configuration design (`map-storage-layout` enum, `num-columns`).
3. Any concerns about compatibility.
4. Additional use cases — beyond time-series/IoT/observability, are there other
scenarios in your workloads where MAP columns have high-cardinality,
locally-repetitive keys that would benefit from this optimization?
Looking forward to the discussion!
Best regards,
Xinyu