Proposal: Phase 1 implementation for UNPIVOT support in IoTDB table model

Bryan Yang Thu, 04 Jun 2026 01:00:41 -0700

*Hi Apache IoTDB community,*

*issue: *https://github.com/apache/iotdb/issues/17799


I would like to propose implementing* #17799 *in phases.

This first PR will address only the *UNPIVOT part of #17799*. *PIVOT and
advanced UNPIVOT* features will be handled in follow-up PRs *after the
Phase 1 PR has been reviewed and merged*.

The first *PR* should be marked as: *Part of #17799*

It should not use Fixes #17799 or Closes #17799, *because this PR will not
complete the whole issue.*
Feature Definition

UNPIVOT converts multiple measurement columns in the IoTDB table model from
a wide-table format into a long-table format.

Before UNPIVOT:
time, device_id, region, temperature, humidity

After UNPIVOT:
time, device_id, region, measurement, value

Here:

   - measurement stores the original source column name, such as temperature
    or humidity.
   - value stores the corresponding source column value.

This is useful for IoT scenarios where measurements are stored as separate
columns, but downstream analytics, export, or visualization prefers a
normalized (measurement, value) format.
Syntax

UNPIVOT v1 will support a subset of DuckDB simplified UNPIVOT syntax,
with INCLUDE
NULLS added to satisfy the NULL handling requirement in #17799.
UNPIVOT [INCLUDE NULLS] source_relation ON value_column [, value_column ...]
INTO NAME name_column VALUE value_column_alias;

Example:
UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
value;

It can also be used in subqueries:
SELECT * FROM ( UNPIVOT device_metrics ON temperature, humidity INTO NAME
measurement VALUE value );

And with CTEs:
WITH long_metrics AS ( UNPIVOT device_metrics ON temperature, humidity INTO
NAME measurement VALUE value ) SELECT * FROM long_metrics;
Syntax ComponentsUNPIVOT

Performs the wide-to-long transformation.
INCLUDE NULLS

Optional.

By default, rows whose unpivoted value is NULL are dropped. If INCLUDE NULLS is
specified, those rows are kept.
source_relation

The input relation.

In v1, I propose to support:

   - normal table names
   - CTE names
   - parenthesized subqueries

For more complex inputs such as joins, users can wrap them in a subquery or
CTE first.
ON value_column [, ...]

Specifies the source columns to unpivot.

In v1, only explicit column names are supported. The following are not
included in this phase:

   - expressions
   - CAST
   - wildcards
   - dynamic column selection
   - COLUMNS(* EXCLUDE (...))

INTO NAME name_column

Specifies the generated column that stores the original source column name.

The generated name column should use the existing text type in the IoTDB
table model, either STRING or TEXT depending on the current internal type
system.
VALUE value_column_alias

Specifies the generated column that stores the original source column value.

Its type is the same as the strictly identical type of all columns listed
in the ON clause.
Output Columns

The output columns are:

   1. visible columns from source_relation that are not listed in ON
   2. name_column
   3. value_column_alias

For example, given source columns:
time, device_id, region, temperature, humidity

The query:
UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
value;

Produces:
time, device_id, region, measurement, value

temperature and humidity are no longer output as separate columns.
NULL Semantics

By default, UNPIVOT drops NULL values.
UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
value;

Is semantically equivalent to:
SELECT time, device_id, region, 'temperature' AS measurement, temperature AS
value FROM device_metrics WHERE temperature IS NOT NULL UNION ALL SELECT
time, device_id, region, 'humidity' AS measurement, humidity AS value FROM
device_metrics WHERE humidity IS NOT NULL;

With INCLUDE NULLS, the IS NOT NULL filters are not applied.

UNPIVOT should not guarantee output row order. Users should use ORDER BY if
they need stable ordering. Tests should also use explicit ordering.
Type Rules

In v1, all columns listed in the ON clause must have exactly the same type.

Supported types should follow the existing IoTDB table model scalar types,
such as:
BOOLEAN INT32 INT64 FLOAT DOUBLE TEXT / STRING TIMESTAMP DATE BLOB

There should be no automatic type coercion in v1. For example:

   - INT32 + INT64 should not be automatically coerced.
   - FLOAT + DOUBLE should not be automatically coerced.
   - TEXT + STRING should follow the existing IoTDB type system.

This strict type check should be performed in the UNPIVOT analyzer before
planning. We should not rely only on existing UNION ALL analysis, because
set operations may find a common super type, which would conflict with the
strict v1 semantics.
Validation Rules

UNPIVOT v1 should validate that:

   - ON columns exist
   - ON columns are not duplicated
   - ON columns are visible columns
   - name_column does not conflict with retained output columns
   - value_column_alias does not conflict with retained output columns
   - name_column and value_column_alias do not conflict with each other
   - quoted identifier and case-insensitive conflicts follow existing IoTDB
   identifier rules

Implementation Plan

I propose not to rewrite UNPIVOT into UNION ALL in the pure parser or
pre-analyzer rewrite phase, because source visible columns, hidden columns,
types, and CTE/subquery schemas are only known after analysis.

A safer implementation flow is:

   1. Add UNPIVOT as a queryPrimary / query body alternative in the
   relational SQL grammar.
   2. Parser builds an Unpivot AST node.
   3. Analyzer analyzes source_relation first, then validates ON columns,
   visibility, duplicates, output-name conflicts, and strict same-type rules.
   4. Analyzer determines the static output schema for the Unpivot node.
   5. Planner expands Unpivot into an equivalent UNION ALL plan, or reuses
   existing set operation planning where possible.

Placing UNPIVOT at the queryPrimary / query body level makes it naturally
support top-level usage, subqueries, and CTEs, which matches the issue
requirement and DuckDB simplified syntax.
Test Plan

The first phase should cover at least:

   - top-level UNPIVOT
   - CTE input
   - subquery input
   - default NULL dropping
   - INCLUDE NULLS
   - missing ON column
   - duplicate ON column
   - name / value output column conflict
   - mismatched value column types
   - INT32 + INT64 not automatically coerced
   - quoted identifier / case-insensitive conflicts

Phased Plan

I plan to implement this issue in phases.

Phase 1 will focus only on UNPIVOT v1. It will support:

   - explicit value columns
   - static output schema
   - default NULL dropping
   - optional INCLUDE NULLS
   - strict same-type validation
   - usage as a top-level query, subquery, or CTE

After the Phase 1 PR is reviewed and merged, I will start Phase 2 in
follow-up PRs.

Phase 2 will cover PIVOT and advanced UNPIVOT features, such as:

   - dynamic column selection
   - COLUMNS(* EXCLUDE (...))
   - expressions
   - CAST
   - automatic type coercion
   - multiple VALUE columns

*The reason for waiting until Phase 1 is merged before starting Phase 2 is
that UNPIVOT v1 introduces the basic grammar, AST, analyzer, planner, and
test structure for this family of syntax. It is better to first get
agreement from reviewers on these foundations before adding more complex
features on top of them.*

This split also keeps the first PR small and reviewable. UNPIVOT v1 has a
static output schema, so it can be analyzed and planned safely. PIVOT,
especially dynamic PIVOT without an explicit IN list, may require runtime
discovery of distinct values, which makes output schema determination more
complex.

Doing PIVOT before the UNPIVOT foundation is reviewed could increase
implementation risk and make the PR harder to review.

Best regards,

Bryan Yang(杨易达)

Proposal: Phase 1 implementation for UNPIVOT support in IoTDB table model

Reply via email to