Re: Proposal: Phase 1 implementation for UNPIVOT support in IoTDB table model

Yuan Tian Mon, 15 Jun 2026 17:35:08 -0700

Hi Bryan,

Sorry for the late reply, this issue has been assigned to
another contributor.



Best regards,
----------------------
Yuan Tian


On Thu, Jun 4, 2026 at 3:59 PM Bryan Yang <[email protected]> wrote:

> *Hi Apache IoTDB community,*
>
> *issue: *https://github.com/apache/iotdb/issues/17799
>
> I would like to propose implementing* #17799 *in phases.
>
> This first PR will address only the *UNPIVOT part of #17799*. *PIVOT and
> advanced UNPIVOT* features will be handled in follow-up PRs *after the
> Phase 1 PR has been reviewed and merged*.
>
> The first *PR* should be marked as: *Part of #17799*
>
> It should not use Fixes #17799 or Closes #17799, *because this PR will not
> complete the whole issue.*
> Feature Definition
>
> UNPIVOT converts multiple measurement columns in the IoTDB table model from
> a wide-table format into a long-table format.
>
> Before UNPIVOT:
> time, device_id, region, temperature, humidity
>
> After UNPIVOT:
> time, device_id, region, measurement, value
>
> Here:
>
>    - measurement stores the original source column name, such as
> temperature
>     or humidity.
>    - value stores the corresponding source column value.
>
> This is useful for IoT scenarios where measurements are stored as separate
> columns, but downstream analytics, export, or visualization prefers a
> normalized (measurement, value) format.
> Syntax
>
> UNPIVOT v1 will support a subset of DuckDB simplified UNPIVOT syntax,
> with INCLUDE
> NULLS added to satisfy the NULL handling requirement in #17799.
> UNPIVOT [INCLUDE NULLS] source_relation ON value_column [, value_column
> ...]
> INTO NAME name_column VALUE value_column_alias;
>
> Example:
> UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
> value;
>
> It can also be used in subqueries:
> SELECT * FROM ( UNPIVOT device_metrics ON temperature, humidity INTO NAME
> measurement VALUE value );
>
> And with CTEs:
> WITH long_metrics AS ( UNPIVOT device_metrics ON temperature, humidity INTO
> NAME measurement VALUE value ) SELECT * FROM long_metrics;
> Syntax ComponentsUNPIVOT
>
> Performs the wide-to-long transformation.
> INCLUDE NULLS
>
> Optional.
>
> By default, rows whose unpivoted value is NULL are dropped. If INCLUDE
> NULLS is
> specified, those rows are kept.
> source_relation
>
> The input relation.
>
> In v1, I propose to support:
>
>    - normal table names
>    - CTE names
>    - parenthesized subqueries
>
> For more complex inputs such as joins, users can wrap them in a subquery or
> CTE first.
> ON value_column [, ...]
>
> Specifies the source columns to unpivot.
>
> In v1, only explicit column names are supported. The following are not
> included in this phase:
>
>    - expressions
>    - CAST
>    - wildcards
>    - dynamic column selection
>    - COLUMNS(* EXCLUDE (...))
>
> INTO NAME name_column
>
> Specifies the generated column that stores the original source column name.
>
> The generated name column should use the existing text type in the IoTDB
> table model, either STRING or TEXT depending on the current internal type
> system.
> VALUE value_column_alias
>
> Specifies the generated column that stores the original source column
> value.
>
> Its type is the same as the strictly identical type of all columns listed
> in the ON clause.
> Output Columns
>
> The output columns are:
>
>    1. visible columns from source_relation that are not listed in ON
>    2. name_column
>    3. value_column_alias
>
> For example, given source columns:
> time, device_id, region, temperature, humidity
>
> The query:
> UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
> value;
>
> Produces:
> time, device_id, region, measurement, value
>
> temperature and humidity are no longer output as separate columns.
> NULL Semantics
>
> By default, UNPIVOT drops NULL values.
> UNPIVOT device_metrics ON temperature, humidity INTO NAME measurement VALUE
> value;
>
> Is semantically equivalent to:
> SELECT time, device_id, region, 'temperature' AS measurement, temperature
> AS
> value FROM device_metrics WHERE temperature IS NOT NULL UNION ALL SELECT
> time, device_id, region, 'humidity' AS measurement, humidity AS value FROM
> device_metrics WHERE humidity IS NOT NULL;
>
> With INCLUDE NULLS, the IS NOT NULL filters are not applied.
>
> UNPIVOT should not guarantee output row order. Users should use ORDER BY if
> they need stable ordering. Tests should also use explicit ordering.
> Type Rules
>
> In v1, all columns listed in the ON clause must have exactly the same type.
>
> Supported types should follow the existing IoTDB table model scalar types,
> such as:
> BOOLEAN INT32 INT64 FLOAT DOUBLE TEXT / STRING TIMESTAMP DATE BLOB
>
> There should be no automatic type coercion in v1. For example:
>
>    - INT32 + INT64 should not be automatically coerced.
>    - FLOAT + DOUBLE should not be automatically coerced.
>    - TEXT + STRING should follow the existing IoTDB type system.
>
> This strict type check should be performed in the UNPIVOT analyzer before
> planning. We should not rely only on existing UNION ALL analysis, because
> set operations may find a common super type, which would conflict with the
> strict v1 semantics.
> Validation Rules
>
> UNPIVOT v1 should validate that:
>
>    - ON columns exist
>    - ON columns are not duplicated
>    - ON columns are visible columns
>    - name_column does not conflict with retained output columns
>    - value_column_alias does not conflict with retained output columns
>    - name_column and value_column_alias do not conflict with each other
>    - quoted identifier and case-insensitive conflicts follow existing IoTDB
>    identifier rules
>
> Implementation Plan
>
> I propose not to rewrite UNPIVOT into UNION ALL in the pure parser or
> pre-analyzer rewrite phase, because source visible columns, hidden columns,
> types, and CTE/subquery schemas are only known after analysis.
>
> A safer implementation flow is:
>
>    1. Add UNPIVOT as a queryPrimary / query body alternative in the
>    relational SQL grammar.
>    2. Parser builds an Unpivot AST node.
>    3. Analyzer analyzes source_relation first, then validates ON columns,
>    visibility, duplicates, output-name conflicts, and strict same-type
> rules.
>    4. Analyzer determines the static output schema for the Unpivot node.
>    5. Planner expands Unpivot into an equivalent UNION ALL plan, or reuses
>    existing set operation planning where possible.
>
> Placing UNPIVOT at the queryPrimary / query body level makes it naturally
> support top-level usage, subqueries, and CTEs, which matches the issue
> requirement and DuckDB simplified syntax.
> Test Plan
>
> The first phase should cover at least:
>
>    - top-level UNPIVOT
>    - CTE input
>    - subquery input
>    - default NULL dropping
>    - INCLUDE NULLS
>    - missing ON column
>    - duplicate ON column
>    - name / value output column conflict
>    - mismatched value column types
>    - INT32 + INT64 not automatically coerced
>    - quoted identifier / case-insensitive conflicts
>
> Phased Plan
>
> I plan to implement this issue in phases.
>
> Phase 1 will focus only on UNPIVOT v1. It will support:
>
>    - explicit value columns
>    - static output schema
>    - default NULL dropping
>    - optional INCLUDE NULLS
>    - strict same-type validation
>    - usage as a top-level query, subquery, or CTE
>
> After the Phase 1 PR is reviewed and merged, I will start Phase 2 in
> follow-up PRs.
>
> Phase 2 will cover PIVOT and advanced UNPIVOT features, such as:
>
>    - dynamic column selection
>    - COLUMNS(* EXCLUDE (...))
>    - expressions
>    - CAST
>    - automatic type coercion
>    - multiple VALUE columns
>
> *The reason for waiting until Phase 1 is merged before starting Phase 2 is
> that UNPIVOT v1 introduces the basic grammar, AST, analyzer, planner, and
> test structure for this family of syntax. It is better to first get
> agreement from reviewers on these foundations before adding more complex
> features on top of them.*
>
> This split also keeps the first PR small and reviewable. UNPIVOT v1 has a
> static output schema, so it can be analyzed and planned safely. PIVOT,
> especially dynamic PIVOT without an explicit IN list, may require runtime
> discovery of distinct values, which makes output schema determination more
> complex.
>
> Doing PIVOT before the UNPIVOT foundation is reviewed could increase
> implementation risk and make the PR harder to review.
>
> Best regards,
>
> Bryan Yang(杨易达)
>

Re: Proposal: Phase 1 implementation for UNPIVOT support in IoTDB table model

Reply via email to