Re: [DISCUSS] Part 2 of apache/iotdb#17797: Support lateral column aliases in table-model SELECT list

Yuan Tian Mon, 15 Jun 2026 19:20:28 -0700

Hi all,

sorry for replying to the wrong email...


On Tue, Jun 16, 2026 at 10:07 AM Yuan Tian <[email protected]> wrote:

> Hi Bryan,
>
>
> Sorry for the late reply.
>
>
> Thanks for the further research. I agree with the general direction that FFT 
> fits better as a built-in TVF in the table model.
>
> I have a few additional thoughts on the v1 design:
>
> 1. I think we can start with FFT only in the first version.
>
> DFT does not need to be exposed as a separate TVF initially. We can treat FFT 
> as the practical implementation for frequency-domain analysis, and add a 
> separate DFT function later only if there is a clear use case.
>
> 2. I think we may not need a VALUE parameter.
>
> For the input table, all numeric columns except the time column and the 
> PARTITION BY columns can be treated as value columns to transform. If users 
> only want to transform a subset of columns, they can select only those 
> columns in the DATA subquery.
>
> For example:
>
> SELECT *
> FROM FFT(
>   DATA => (
>     SELECT time, device_id, temperature, speed
>     FROM sensor
>   ) PARTITION BY device_id ORDER BY time
> );
>
> Here, temperature and speed would both be transformed.
>
> 3. About time and sample interval.
>
> NumPy's fft itself does not take a time column or timestamps as input. It 
> only takes the value array. The sample interval is only needed when computing 
> the physical frequency axis, for example through numpy.fft.fftfreq(n, 
> d=sample_interval).
>
> For IoTDB, since the table model has a time column, we can use the time 
> column to define the input order and infer the sample interval when the user 
> does not provide one.
>
> I suggest making SAMPLE_INTERVAL an optional parameter, represented as a 
> duration literal, such as:
>
> SAMPLE_INTERVAL => 1ms
> SAMPLE_INTERVAL => 1s
>
> If SAMPLE_INTERVAL is provided, it should override the interval inferred from 
> the time column. If it is not provided, the function can infer it as:
>
> (last_time - first_time) / (row_count - 1)
>
> I do not think we need to validate whether every adjacent timestamp interval 
> is exactly the same in v1. We can assume the input represents a uniformly 
> sampled sequence.
>
> However, we should still validate that the time column is ascending within 
> each partition. If the timestamps are not ascending in a partition, the 
> function should throw an exception.
>
> 4. About SPECTRUM.
>
> SPECTRUM mainly controls whether the function outputs the full FFT spectrum 
> or only the one-sided spectrum.
>
> For v1, I think we can keep this simple and output the full spectrum, which 
> is closer to numpy.fft.fft. One-sided output, similar to numpy.fft.rfft, can 
> be discussed later.
>
> 5. About the output schema.
>
> NumPy's fft returns complex values. It does not directly output amplitude or 
> phase. Amplitude and phase are derived values, for example abs(result) and 
> angle(result).
>
> So for v1, I suggest the core output schema should include real and imaginary 
> parts only. For multiple value columns, the output columns should be prefixed 
> with the original column names, for example:
>
> partition columns...,
> frequency_index,
> frequency,
> temperature_real,
> temperature_imag,
> speed_real,
> speed_imag
>
> Here, frequency_index should mean the FFT output index / frequency bin index, 
> not just a generated row number. It is useful for preserving the original FFT 
> output order and aligning with the position in the FFT result array.
>
> The frequency column is not the same for every row in a partition. Each 
> output row corresponds to one frequency bin. For the same partition, multiple 
> transformed value columns share the same frequency_index and frequency axis.
>
> Amplitude and phase can be added later if we think they are useful 
> convenience columns, but I would prefer not to include them in the minimal v1 
> schema.
>
> Best regards,
> -----------------
> Yuan Tian
>
>
> On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote:
>
>> *Hi IoTDB community,*
>>
>> I would like to propose the implementation plan for Part 2 of
>> apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table
>> model SELECT list. issue: https://github.com/apache/iotdb/issues/17797
>>
>> LCA allows a later SELECT item to reference an explicit alias defined by
>> an
>> earlier SELECT item.
>>
>> SELECT s1 AS x, x + 1 AS y
>> FROM table1;
>>
>> This should be analyzed as:
>>
>> SELECT s1 AS x, s1 + 1 AS y
>> FROM table1;
>>
>> The implementation does not require new keywords or grammar changes. The
>> existing SELECT syntax can be reused, and LCA can be resolved during
>> analysis by rewriting expressions before type analysis, aggregation
>> analysis, output scope computation, and planning.
>> Proposed semantics
>>
>> LCA is resolved from left to right inside the SELECT list.
>>
>> SELECT s1 AS x, x + 1 AS y, y * 2 AS z
>> FROM table1;
>>
>> is equivalent to:
>>
>> SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z
>> FROM table1;
>>
>> Forward references are not supported:
>>
>> SELECT y + 1 AS x, s1 AS y
>> FROM table1;
>>
>> The alias of the current SELECT item is not visible to its own expression:
>>
>> SELECT x + 1 AS x
>> FROM table1;
>>
>> Only unqualified identifiers are considered alias references. Qualified
>> expressions such as t.x, table1.x, or x.y should continue to use the
>> existing qualified column resolution rules.
>>
>> The recommended resolution priority is:
>>
>> local source column > visible previous aliases > existing analyzer
>> resolution
>>
>> This means local input columns should take precedence over previous
>> aliases, while outer query columns should not block visible aliases from
>> the current SELECT list.
>>
>> Duplicate aliases should not overwrite each other. If multiple previous
>> aliases have the same canonical name and there is no same-name local
>> source
>> column, the analyzer should report:
>>
>> Column alias 'x' is ambiguous
>>
>> Non-goals
>>
>> This change should not alter WHERE or HAVING semantics.
>>
>> SELECT s1 AS x FROM table1 WHERE x > 1;
>> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
>>
>> These should still resolve x or a only through the input scope, not
>> through
>> SELECT aliases.
>>
>> LCA should also not enter subquery scopes:
>>
>> SELECT s1 AS x, (SELECT x FROM table1) AS y
>> FROM table1;
>>
>> The x inside the subquery should not be rewritten using the outer SELECT
>> alias.
>> Analyzer changes
>>
>> The entry point should be StatementAnalyzer.analyzeSelect.
>>
>> The SELECT list can be processed left to right. For each normal
>> SingleColumn
>> :
>>
>> 1. Rewrite the original expression using visible previous aliases.
>> 2. Register window metadata for any newly copied window functions.
>> 3. Analyze the rewritten expression.
>> 4. Record the rewritten expression in SelectAnalysis output expressions.
>> 5. Record SingleColumn -> rewritten expression mapping.
>> 6. Add the current explicit alias to visible aliases only after its
>> expression is rewritten and analyzed.
>>
>> Pseudo-code:
>>
>> List<SelectAlias> visibleAliases = new ArrayList<>();
>>
>> for (SelectItem item : node.getSelect().getSelectItems()) {
>>   if (item instanceof SingleColumn) {
>>     SingleColumn singleColumn = (SingleColumn) item;
>>
>>     Expression originalExpression = singleColumn.getExpression();
>>     Expression rewrittenExpression =
>>         rewriteLateralColumnAlias(originalExpression, scope,
>> visibleAliases);
>>
>>     resolveWindowFunctionsInExpression(node, rewrittenExpression);
>>
>>     analyzeSelectSingleColumn(
>>         rewrittenExpression,
>>         node,
>>         scope,
>>         outputExpressionBuilder,
>>         selectExpressionBuilder);
>>
>>     singleColumnOutputExpressions.put(
>>         NodeRef.of(singleColumn),
>>         ImmutableList.of(rewrittenExpression));
>>
>>     if (singleColumn.getAlias().isPresent() &&
>> !containsColumnsFunction(singleColumn)) {
>>       Identifier alias = singleColumn.getAlias().get();
>>       visibleAliases.add(
>>           new SelectAlias(
>>               alias.getCanonicalValue(),
>>               outputPosition,
>>               rewrittenExpression));
>>     }
>>
>>     outputPosition++;
>>   }
>> }
>>
>> SelectAlias can be extended to keep:
>>
>> canonicalName
>> position
>> rewrittenExpression
>>
>> SelectAnalysis should keep semantic output expressions per SingleColumn,
>> for example:
>>
>> NodeRef<SingleColumn> -> List<Expression>
>>
>> The original AST should remain unchanged because SingleColumn.expression
>> is
>> final.
>> Expression rewriting
>>
>> LCA rewriting should replace only unqualified Identifiers.
>>
>> When an identifier is encountered:
>>
>> 1. If it resolves to a local input column, keep it unchanged.
>> 2. Otherwise, look up visible previous aliases.
>> 3. If exactly one alias matches, replace the identifier with a deep
>> copy of that alias's rewritten expression.
>> 4. If multiple aliases match, report alias ambiguity.
>> 5. If no alias matches, keep the identifier unchanged and let existing
>> analysis handle it.
>>
>> The rewriter should not traverse into SubqueryExpression.
>>
>> It should also handle DereferenceExpression carefully so that expressions
>> such as t.x, table1.x, and x.y are not rewritten as alias references.
>>
>> Each alias replacement must create a deep copy of the defining expression.
>> Reusing the same expression instance is unsafe because IoTDB uses NodeRef
>> as identity-based keys in analysis maps.
>> AllColumns and COLUMNS(...)
>>
>> AllColumns should not register aliases for LCA.
>>
>> A SingleColumn containing COLUMNS(...) should be expanded and analyzed as
>> today, but its alias should not be registered as a reusable LCA alias
>> because it may expand to multiple output columns.
>>
>> SELECT COLUMNS('s.*') AS x, x + 1 AS y
>> FROM table1;
>>
>> Here, x should not be treated as a unique alias definition from
>> COLUMNS(...)
>> .
>> Aggregation and window functions
>>
>> LCA may reference previous aggregate expressions:
>>
>> SELECT avg(s1) AS a, a + 1 AS b
>> FROM table1;
>>
>> This should be rewritten to:
>>
>> SELECT avg(s1) AS a, avg(s1) + 1 AS b
>> FROM table1;
>>
>> Existing aggregation validity checks should still apply after rewriting.
>>
>> For window functions, supporting alias references is preferred:
>>
>> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2
>> FROM table1;
>>
>> Since LCA deep copy creates new FunctionCall nodes, any copied window
>> functions need their resolved window metadata registered again in
>> Analysis.
>> If this is not supported in the first implementation, we should reject
>> such
>> aliases explicitly with a clear error instead of failing silently.
>> GROUP BY and ORDER BY compatibility
>>
>> This should remain compatible with Part 1 alias reuse behavior.
>>
>> SELECT date_bin(1h, time) AS hour_time, avg(s1)
>> FROM table1
>> GROUP BY hour_time
>> ORDER BY hour_time;
>>
>> GROUP BY <alias> should continue to resolve to the corresponding select
>> expression, while ORDER BY <alias> can continue to resolve to the output
>> field reference.
>>
>> Because SELECT output expressions are already rewritten, Part 1 and Part 2
>> should compose naturally:
>>
>> SELECT s1 AS x, x + 1 AS y, count(*)
>> FROM table1
>> GROUP BY y;
>>
>> GROUP BY y should resolve to:
>>
>> s1 + 1
>>
>> Suggested tests
>>
>> I plan to add or update tests in SelectAliasReuseTest for:
>>
>> SELECT s1 AS x, x + 1 AS y FROM table1;
>> SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1;
>> SELECT s1 AS x, x + x AS y FROM table1;
>> SELECT y + 1 AS x, s1 AS y FROM table1;
>> SELECT x + 1 AS x FROM table1;
>> SELECT s1 AS x, x + 1 AS y FROM table_with_x;
>> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1;
>> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x;
>> SELECT s1 AS x, table1.x + 1 AS y FROM table1;
>> SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1;
>> SELECT avg(s1) AS a, a + 1 AS b FROM table1;
>> SELECT s1 AS x, avg(s2) + x AS y FROM table1;
>> SELECT avg(s1) AS a FROM table1 HAVING a > 1;
>> SELECT s1 AS x FROM table1 WHERE x > 1;
>> SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1;
>> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1;
>> SELECT s1 AS x, x FROM table1;
>>
>> Please let me know whether this direction looks reasonable, especially the
>> resolution priority, duplicate alias handling, and the preferred behavior
>> for window function aliases.
>>
>> *Best regards, Bryan Yang(杨易达)*
>>
>

Re: [DISCUSS] Part 2 of apache/iotdb#17797: Support lateral column aliases in table-model SELECT list

Reply via email to