Hi everyone,

I'd like to start a discussion on a storage optimization for `MAP<STRING, T>` 
columns targeting time-series, IoT, observability, and similar scenarios.

### Problem

In these workloads, data is **"globally heterogeneous but locally 
homogeneous"** — the global key union across all rows can reach tens of 
thousands, but each row only carries 5~50 keys that are highly repetitive 
within groups (e.g., the same reportor always reports `{usage, load, iowait}`).

Current options all fall short:

- **Default MAP storage** (KV arrays): no per-key predicate pushdown, no 
per-key column pruning, no per-key statistics.

- **VARIANT**: unshredded fields (>90% in these scenarios) fall into a binary 
blob, losing all columnar advantages.

- **Wide table**: flattening 50,000+ fields into columns results in >99% NULL, 
with metadata explosion and unbounded schema churn.

### Proposed Solution: Columnar-Extend

We propose an **opt-in storage optimization** for MAP columns — enabled via a 
table option:

```sql

CREATE TABLE metrics (

ts TIMESTAMP,

metric STRING,

ext-map MAP<STRING, DOUBLE>

) WITH (

'ext-map.map-storage-layout' = 'extend',

'ext-map.columnar-extend.num-columns' = '16'

);

```

The key idea: instead of storing MAP entries as KV arrays, physically rewrite 
them into a **Struct with `K` typed reusable columns** plus a lightweight 
`__field_mapping`. This gives every key full columnar treatment — predicate 
pushdown, column pruning, native statistics — while keeping the column count 
bounded at `K` (tens, not tens of thousands). Rows exceeding `K` keys spill 
into a small overflow map, so correctness never depends on `K` being large 
enough. `K` adapts across files based on the actual data width.

The logical type stays `MAP<STRING, T>` — the optimization is transparent to 
users. Existing queries like `ext-map['usage'] > 30` work unchanged; the engine 
translates them into physical sub-column predicates internally.

### PIP Document

The full proposal — including physical layout, query path, public interface 
changes, and rejected alternatives — is available here: 
https://cwiki.apache.org/confluence/display/PAIMON/PIP-43%3A+Columnar-Extend+Storage+Optimization+for+MAP+Type+in+Paimon

### Looking for Feedback

I'd appreciate community feedback on:

1. The overall approach — e.g., column count exceeding K with `__overflow` vs. 
other strategies.

2. The configuration design (`map-storage-layout` enum, `num-columns`).

3. Any concerns about compatibility.

4. Additional use cases — beyond time-series/IoT/observability, are there other 
scenarios in your workloads where MAP columns have high-cardinality, 
locally-repetitive keys that would benefit from this optimization?

Looking forward to the discussion!

Best regards,

Xinyu

Reply via email to