Re: [DISCUSS] Part 2 of apache/iotdb#17797: Support lateral column aliases in table-model SELECT list

Yuan Tian Mon, 15 Jun 2026 19:08:25 -0700

Hi Bryan,


Sorry for the late reply.


Thanks for the further research. I agree with the general direction
that FFT fits better as a built-in TVF in the table model.

I have a few additional thoughts on the v1 design:

1. I think we can start with FFT only in the first version.

DFT does not need to be exposed as a separate TVF initially. We can
treat FFT as the practical implementation for frequency-domain
analysis, and add a separate DFT function later only if there is a
clear use case.

2. I think we may not need a VALUE parameter.

For the input table, all numeric columns except the time column and
the PARTITION BY columns can be treated as value columns to transform.
If users only want to transform a subset of columns, they can select
only those columns in the DATA subquery.

For example:

SELECT *
FROM FFT(
  DATA => (
    SELECT time, device_id, temperature, speed
    FROM sensor
  ) PARTITION BY device_id ORDER BY time
);

Here, temperature and speed would both be transformed.

3. About time and sample interval.

NumPy's fft itself does not take a time column or timestamps as input.
It only takes the value array. The sample interval is only needed when
computing the physical frequency axis, for example through
numpy.fft.fftfreq(n, d=sample_interval).

For IoTDB, since the table model has a time column, we can use the
time column to define the input order and infer the sample interval
when the user does not provide one.

I suggest making SAMPLE_INTERVAL an optional parameter, represented as
a duration literal, such as:

SAMPLE_INTERVAL => 1ms
SAMPLE_INTERVAL => 1s

If SAMPLE_INTERVAL is provided, it should override the interval
inferred from the time column. If it is not provided, the function can
infer it as:

(last_time - first_time) / (row_count - 1)

I do not think we need to validate whether every adjacent timestamp
interval is exactly the same in v1. We can assume the input represents
a uniformly sampled sequence.

However, we should still validate that the time column is ascending
within each partition. If the timestamps are not ascending in a
partition, the function should throw an exception.

4. About SPECTRUM.

SPECTRUM mainly controls whether the function outputs the full FFT
spectrum or only the one-sided spectrum.

For v1, I think we can keep this simple and output the full spectrum,
which is closer to numpy.fft.fft. One-sided output, similar to
numpy.fft.rfft, can be discussed later.

5. About the output schema.

NumPy's fft returns complex values. It does not directly output
amplitude or phase. Amplitude and phase are derived values, for
example abs(result) and angle(result).

So for v1, I suggest the core output schema should include real and
imaginary parts only. For multiple value columns, the output columns
should be prefixed with the original column names, for example:

partition columns...,
frequency_index,
frequency,
temperature_real,
temperature_imag,
speed_real,
speed_imag

Here, frequency_index should mean the FFT output index / frequency bin
index, not just a generated row number. It is useful for preserving
the original FFT output order and aligning with the position in the
FFT result array.

The frequency column is not the same for every row in a partition.
Each output row corresponds to one frequency bin. For the same
partition, multiple transformed value columns share the same
frequency_index and frequency axis.

Amplitude and phase can be added later if we think they are useful
convenience columns, but I would prefer not to include them in the
minimal v1 schema.

Best regards,
-----------------
Yuan Tian


On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote:

> *Hi IoTDB community,*
>
> I would like to propose the implementation plan for Part 2 of
> apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
> model SELECT list. issue: https://github.com/apache/iotdb/issues/17797
>
> LCA allows a later SELECT item to reference an explicit alias defined by an
> earlier SELECT item.
>
> SELECT s1 AS x, x + 1 AS y
> FROM table1;
>
> This should be analyzed as:
>
> SELECT s1 AS x, s1 + 1 AS y
> FROM table1;
>
> The implementation does not require new keywords or grammar changes. The
> existing SELECT syntax can be reused, and LCA can be resolved during
> analysis by rewriting expressions before type analysis, aggregation
> analysis, output scope computation, and planning.
> Proposed semantics
>
> LCA is resolved from left to right inside the SELECT list.
>
> SELECT s1 AS x, x + 1 AS y, y * 2 AS z
> FROM table1;
>
> is equivalent to:
>
> SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
> FROM table1;
>
> Forward references are not supported:
>
> SELECT y + 1 AS x, s1 AS y
> FROM table1;
>
> The alias of the current SELECT item is not visible to its own expression:
>
> SELECT x + 1 AS x
> FROM table1;
>
> Only unqualified identifiers are considered alias references. Qualified
> expressions such as t.x, table1.x, or x.y should continue to use the
> existing qualified column resolution rules.
>
> The recommended resolution priority is:
>
> local source column > visible previous aliases > existing analyzer
> resolution
>
> This means local input columns should take precedence over previous
> aliases, while outer query columns should not block visible aliases from
> the current SELECT list.
>
> Duplicate aliases should not overwrite each other. If multiple previous
> aliases have the same canonical name and there is no same-name local source
> column, the analyzer should report:
>
> Column alias 'x' is ambiguous
>
> Non-goals
>
> This change should not alter WHERE or HAVING semantics.
>
> SELECT s1 AS x FROM table1 WHERE x > 1;
> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
>
> These should still resolve x or a only through the input scope, not through
> SELECT aliases.
>
> LCA should also not enter subquery scopes:
>
> SELECT s1 AS x, (SELECT x FROM table1) AS y
> FROM table1;
>
> The x inside the subquery should not be rewritten using the outer SELECT
> alias.
> Analyzer changes
>
> The entry point should be StatementAnalyzer.analyzeSelect.
>
> The SELECT list can be processed left to right. For each normal
> SingleColumn
> :
>
> 1. Rewrite the original expression using visible previous aliases.
> 2. Register window metadata for any newly copied window functions.
> 3. Analyze the rewritten expression.
> 4. Record the rewritten expression in SelectAnalysis output expressions.
> 5. Record SingleColumn -> rewritten expression mapping.
> 6. Add the current explicit alias to visible aliases only after its
> expression is rewritten and analyzed.
>
> Pseudo-code:
>
> List<SelectAlias> visibleAliases = new ArrayList<>();
>
> for (SelectItem item : node.getSelect().getSelectItems()) {
>   if (item instanceof SingleColumn) {
>     SingleColumn singleColumn = (SingleColumn) item;
>
>     Expression originalExpression = singleColumn.getExpression();
>     Expression rewrittenExpression =
>         rewriteLateralColumnAlias(originalExpression, scope,
> visibleAliases);
>
>     resolveWindowFunctionsInExpression(node, rewrittenExpression);
>
>     analyzeSelectSingleColumn(
>         rewrittenExpression,
>         node,
>         scope,
>         outputExpressionBuilder,
>         selectExpressionBuilder);
>
>     singleColumnOutputExpressions.put(
>         NodeRef.of(singleColumn),
>         ImmutableList.of(rewrittenExpression));
>
>     if (singleColumn.getAlias().isPresent() &&
> !containsColumnsFunction(singleColumn)) {
>       Identifier alias = singleColumn.getAlias().get();
>       visibleAliases.add(
>           new SelectAlias(
>               alias.getCanonicalValue(),
>               outputPosition,
>               rewrittenExpression));
>     }
>
>     outputPosition++;
>   }
> }
>
> SelectAlias can be extended to keep:
>
> canonicalName
> position
> rewrittenExpression
>
> SelectAnalysis should keep semantic output expressions per SingleColumn,
> for example:
>
> NodeRef<SingleColumn> -> List<Expression>
>
> The original AST should remain unchanged because SingleColumn.expression is
> final.
> Expression rewriting
>
> LCA rewriting should replace only unqualified Identifiers.
>
> When an identifier is encountered:
>
> 1. If it resolves to a local input column, keep it unchanged.
> 2. Otherwise, look up visible previous aliases.
> 3. If exactly one alias matches, replace the identifier with a deep
> copy of that alias's rewritten expression.
> 4. If multiple aliases match, report alias ambiguity.
> 5. If no alias matches, keep the identifier unchanged and let existing
> analysis handle it.
>
> The rewriter should not traverse into SubqueryExpression.
>
> It should also handle DereferenceExpression carefully so that expressions
> such as t.x, table1.x, and x.y are not rewritten as alias references.
>
> Each alias replacement must create a deep copy of the defining expression.
> Reusing the same expression instance is unsafe because IoTDB uses NodeRef
> as identity-based keys in analysis maps.
> AllColumns and COLUMNS(...)
>
> AllColumns should not register aliases for LCA.
>
> A SingleColumn containing COLUMNS(...) should be expanded and analyzed as
> today, but its alias should not be registered as a reusable LCA alias
> because it may expand to multiple output columns.
>
> SELECT COLUMNS('s.*') AS x, x + 1 AS y
> FROM table1;
>
> Here, x should not be treated as a unique alias definition from
> COLUMNS(...)
> .
> Aggregation and window functions
>
> LCA may reference previous aggregate expressions:
>
> SELECT avg(s1) AS a, a + 1 AS b
> FROM table1;
>
> This should be rewritten to:
>
> SELECT avg(s1) AS a, avg(s1) + 1 AS b
> FROM table1;
>
> Existing aggregation validity checks should still apply after rewriting.
>
> For window functions, supporting alias references is preferred:
>
> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
> FROM table1;
>
> Since LCA deep copy creates new FunctionCall nodes, any copied window
> functions need their resolved window metadata registered again in Analysis.
> If this is not supported in the first implementation, we should reject such
> aliases explicitly with a clear error instead of failing silently.
> GROUP BY and ORDER BY compatibility
>
> This should remain compatible with Part 1 alias reuse behavior.
>
> SELECT date_bin(1h, time) AS hour_time, avg(s1)
> FROM table1
> GROUP BY hour_time
> ORDER BY hour_time;
>
> GROUP BY <alias> should continue to resolve to the corresponding select
> expression, while ORDER BY <alias> can continue to resolve to the output
> field reference.
>
> Because SELECT output expressions are already rewritten, Part 1 and Part 2
> should compose naturally:
>
> SELECT s1 AS x, x + 1 AS y, count(*)
> FROM table1
> GROUP BY y;
>
> GROUP BY y should resolve to:
>
> s1 + 1
>
> Suggested tests
>
> I plan to add or update tests in SelectAliasReuseTest for:
>
> SELECT s1 AS x, x + 1 AS y FROM table1;
> SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
> SELECT s1 AS x, x + x AS y FROM table1;
> SELECT y + 1 AS x, s1 AS y FROM table1;
> SELECT x + 1 AS x FROM table1;
> SELECT s1 AS x, x + 1 AS y FROM table_with_x;
> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
> SELECT s1 AS x, table1.x + 1 AS y FROM table1;
> SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
> SELECT avg(s1) AS a, a + 1 AS b FROM table1;
> SELECT s1 AS x, avg(s2) + x AS y FROM table1;
> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
> SELECT s1 AS x FROM table1 WHERE x > 1;
> SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
> SELECT s1 AS x, x FROM table1;
>
> Please let me know whether this direction looks reasonable, especially the
> resolution priority, duplicate alias handling, and the preferred behavior
> for window function aliases.
>
> *Best regards, Bryan Yang(杨易达)*
>

Re: [DISCUSS] Part 2 of apache/iotdb#17797: Support lateral column aliases in table-model SELECT list

Reply via email to