Hi all, sorry for replying to the wrong email...
On Tue, Jun 16, 2026 at 10:07 AM Yuan Tian <[email protected]> wrote: > Hi Bryan, > > > Sorry for the late reply. > > > Thanks for the further research. I agree with the general direction that FFT > fits better as a built-in TVF in the table model. > > I have a few additional thoughts on the v1 design: > > 1. I think we can start with FFT only in the first version. > > DFT does not need to be exposed as a separate TVF initially. We can treat FFT > as the practical implementation for frequency-domain analysis, and add a > separate DFT function later only if there is a clear use case. > > 2. I think we may not need a VALUE parameter. > > For the input table, all numeric columns except the time column and the > PARTITION BY columns can be treated as value columns to transform. If users > only want to transform a subset of columns, they can select only those > columns in the DATA subquery. > > For example: > > SELECT * > FROM FFT( > DATA => ( > SELECT time, device_id, temperature, speed > FROM sensor > ) PARTITION BY device_id ORDER BY time > ); > > Here, temperature and speed would both be transformed. > > 3. About time and sample interval. > > NumPy's fft itself does not take a time column or timestamps as input. It > only takes the value array. The sample interval is only needed when computing > the physical frequency axis, for example through numpy.fft.fftfreq(n, > d=sample_interval). > > For IoTDB, since the table model has a time column, we can use the time > column to define the input order and infer the sample interval when the user > does not provide one. > > I suggest making SAMPLE_INTERVAL an optional parameter, represented as a > duration literal, such as: > > SAMPLE_INTERVAL => 1ms > SAMPLE_INTERVAL => 1s > > If SAMPLE_INTERVAL is provided, it should override the interval inferred from > the time column. If it is not provided, the function can infer it as: > > (last_time - first_time) / (row_count - 1) > > I do not think we need to validate whether every adjacent timestamp interval > is exactly the same in v1. We can assume the input represents a uniformly > sampled sequence. > > However, we should still validate that the time column is ascending within > each partition. If the timestamps are not ascending in a partition, the > function should throw an exception. > > 4. About SPECTRUM. > > SPECTRUM mainly controls whether the function outputs the full FFT spectrum > or only the one-sided spectrum. > > For v1, I think we can keep this simple and output the full spectrum, which > is closer to numpy.fft.fft. One-sided output, similar to numpy.fft.rfft, can > be discussed later. > > 5. About the output schema. > > NumPy's fft returns complex values. It does not directly output amplitude or > phase. Amplitude and phase are derived values, for example abs(result) and > angle(result). > > So for v1, I suggest the core output schema should include real and imaginary > parts only. For multiple value columns, the output columns should be prefixed > with the original column names, for example: > > partition columns..., > frequency_index, > frequency, > temperature_real, > temperature_imag, > speed_real, > speed_imag > > Here, frequency_index should mean the FFT output index / frequency bin index, > not just a generated row number. It is useful for preserving the original FFT > output order and aligning with the position in the FFT result array. > > The frequency column is not the same for every row in a partition. Each > output row corresponds to one frequency bin. For the same partition, multiple > transformed value columns share the same frequency_index and frequency axis. > > Amplitude and phase can be added later if we think they are useful > convenience columns, but I would prefer not to include them in the minimal v1 > schema. > > Best regards, > ----------------- > Yuan Tian > > > On Wed, Jun 10, 2026 at 4:50 PM Bryan Yang <[email protected]> wrote: > >> *Hi IoTDB community,* >> >> I would like to propose the implementation plan for Part 2 of >> apache/iotdb#17797: supporting Lateral Column Alias (LCA) in the table >> model SELECT list. issue: https://github.com/apache/iotdb/issues/17797 >> >> LCA allows a later SELECT item to reference an explicit alias defined by >> an >> earlier SELECT item. >> >> SELECT s1 AS x, x + 1 AS y >> FROM table1; >> >> This should be analyzed as: >> >> SELECT s1 AS x, s1 + 1 AS y >> FROM table1; >> >> The implementation does not require new keywords or grammar changes. The >> existing SELECT syntax can be reused, and LCA can be resolved during >> analysis by rewriting expressions before type analysis, aggregation >> analysis, output scope computation, and planning. >> Proposed semantics >> >> LCA is resolved from left to right inside the SELECT list. >> >> SELECT s1 AS x, x + 1 AS y, y * 2 AS z >> FROM table1; >> >> is equivalent to: >> >> SELECT s1 AS x, s1 + 1 AS y, (s1 + 1) * 2 AS z >> FROM table1; >> >> Forward references are not supported: >> >> SELECT y + 1 AS x, s1 AS y >> FROM table1; >> >> The alias of the current SELECT item is not visible to its own expression: >> >> SELECT x + 1 AS x >> FROM table1; >> >> Only unqualified identifiers are considered alias references. Qualified >> expressions such as t.x, table1.x, or x.y should continue to use the >> existing qualified column resolution rules. >> >> The recommended resolution priority is: >> >> local source column > visible previous aliases > existing analyzer >> resolution >> >> This means local input columns should take precedence over previous >> aliases, while outer query columns should not block visible aliases from >> the current SELECT list. >> >> Duplicate aliases should not overwrite each other. If multiple previous >> aliases have the same canonical name and there is no same-name local >> source >> column, the analyzer should report: >> >> Column alias 'x' is ambiguous >> >> Non-goals >> >> This change should not alter WHERE or HAVING semantics. >> >> SELECT s1 AS x FROM table1 WHERE x > 1; >> SELECT avg(s1) AS a FROM table1 HAVING a > 1; >> >> These should still resolve x or a only through the input scope, not >> through >> SELECT aliases. >> >> LCA should also not enter subquery scopes: >> >> SELECT s1 AS x, (SELECT x FROM table1) AS y >> FROM table1; >> >> The x inside the subquery should not be rewritten using the outer SELECT >> alias. >> Analyzer changes >> >> The entry point should be StatementAnalyzer.analyzeSelect. >> >> The SELECT list can be processed left to right. For each normal >> SingleColumn >> : >> >> 1. Rewrite the original expression using visible previous aliases. >> 2. Register window metadata for any newly copied window functions. >> 3. Analyze the rewritten expression. >> 4. Record the rewritten expression in SelectAnalysis output expressions. >> 5. Record SingleColumn -> rewritten expression mapping. >> 6. Add the current explicit alias to visible aliases only after its >> expression is rewritten and analyzed. >> >> Pseudo-code: >> >> List<SelectAlias> visibleAliases = new ArrayList<>(); >> >> for (SelectItem item : node.getSelect().getSelectItems()) { >> if (item instanceof SingleColumn) { >> SingleColumn singleColumn = (SingleColumn) item; >> >> Expression originalExpression = singleColumn.getExpression(); >> Expression rewrittenExpression = >> rewriteLateralColumnAlias(originalExpression, scope, >> visibleAliases); >> >> resolveWindowFunctionsInExpression(node, rewrittenExpression); >> >> analyzeSelectSingleColumn( >> rewrittenExpression, >> node, >> scope, >> outputExpressionBuilder, >> selectExpressionBuilder); >> >> singleColumnOutputExpressions.put( >> NodeRef.of(singleColumn), >> ImmutableList.of(rewrittenExpression)); >> >> if (singleColumn.getAlias().isPresent() && >> !containsColumnsFunction(singleColumn)) { >> Identifier alias = singleColumn.getAlias().get(); >> visibleAliases.add( >> new SelectAlias( >> alias.getCanonicalValue(), >> outputPosition, >> rewrittenExpression)); >> } >> >> outputPosition++; >> } >> } >> >> SelectAlias can be extended to keep: >> >> canonicalName >> position >> rewrittenExpression >> >> SelectAnalysis should keep semantic output expressions per SingleColumn, >> for example: >> >> NodeRef<SingleColumn> -> List<Expression> >> >> The original AST should remain unchanged because SingleColumn.expression >> is >> final. >> Expression rewriting >> >> LCA rewriting should replace only unqualified Identifiers. >> >> When an identifier is encountered: >> >> 1. If it resolves to a local input column, keep it unchanged. >> 2. Otherwise, look up visible previous aliases. >> 3. If exactly one alias matches, replace the identifier with a deep >> copy of that alias's rewritten expression. >> 4. If multiple aliases match, report alias ambiguity. >> 5. If no alias matches, keep the identifier unchanged and let existing >> analysis handle it. >> >> The rewriter should not traverse into SubqueryExpression. >> >> It should also handle DereferenceExpression carefully so that expressions >> such as t.x, table1.x, and x.y are not rewritten as alias references. >> >> Each alias replacement must create a deep copy of the defining expression. >> Reusing the same expression instance is unsafe because IoTDB uses NodeRef >> as identity-based keys in analysis maps. >> AllColumns and COLUMNS(...) >> >> AllColumns should not register aliases for LCA. >> >> A SingleColumn containing COLUMNS(...) should be expanded and analyzed as >> today, but its alias should not be registered as a reusable LCA alias >> because it may expand to multiple output columns. >> >> SELECT COLUMNS('s.*') AS x, x + 1 AS y >> FROM table1; >> >> Here, x should not be treated as a unique alias definition from >> COLUMNS(...) >> . >> Aggregation and window functions >> >> LCA may reference previous aggregate expressions: >> >> SELECT avg(s1) AS a, a + 1 AS b >> FROM table1; >> >> This should be rewritten to: >> >> SELECT avg(s1) AS a, avg(s1) + 1 AS b >> FROM table1; >> >> Existing aggregation validity checks should still apply after rewriting. >> >> For window functions, supporting alias references is preferred: >> >> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 >> FROM table1; >> >> Since LCA deep copy creates new FunctionCall nodes, any copied window >> functions need their resolved window metadata registered again in >> Analysis. >> If this is not supported in the first implementation, we should reject >> such >> aliases explicitly with a clear error instead of failing silently. >> GROUP BY and ORDER BY compatibility >> >> This should remain compatible with Part 1 alias reuse behavior. >> >> SELECT date_bin(1h, time) AS hour_time, avg(s1) >> FROM table1 >> GROUP BY hour_time >> ORDER BY hour_time; >> >> GROUP BY <alias> should continue to resolve to the corresponding select >> expression, while ORDER BY <alias> can continue to resolve to the output >> field reference. >> >> Because SELECT output expressions are already rewritten, Part 1 and Part 2 >> should compose naturally: >> >> SELECT s1 AS x, x + 1 AS y, count(*) >> FROM table1 >> GROUP BY y; >> >> GROUP BY y should resolve to: >> >> s1 + 1 >> >> Suggested tests >> >> I plan to add or update tests in SelectAliasReuseTest for: >> >> SELECT s1 AS x, x + 1 AS y FROM table1; >> SELECT s1 AS x, x + 1 AS y, y * 2 AS z FROM table1; >> SELECT s1 AS x, x + x AS y FROM table1; >> SELECT y + 1 AS x, s1 AS y FROM table1; >> SELECT x + 1 AS x FROM table1; >> SELECT s1 AS x, x + 1 AS y FROM table_with_x; >> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table1; >> SELECT s1 AS x, s2 AS x, x + 1 AS y FROM table_with_x; >> SELECT s1 AS x, table1.x + 1 AS y FROM table1; >> SELECT s1 AS x, (SELECT x FROM table1) AS y FROM table1; >> SELECT avg(s1) AS a, a + 1 AS b FROM table1; >> SELECT s1 AS x, avg(s2) + x AS y FROM table1; >> SELECT avg(s1) AS a FROM table1 HAVING a > 1; >> SELECT s1 AS x FROM table1 WHERE x > 1; >> SELECT COLUMNS('s.*') AS x, x + 1 AS y FROM table1; >> SELECT row_number() OVER (ORDER BY s1) AS rn, rn + 1 AS rn2 FROM table1; >> SELECT s1 AS x, x FROM table1; >> >> Please let me know whether this direction looks reasonable, especially the >> resolution priority, duplicate alias handling, and the preferred behavior >> for window function aliases. >> >> *Best regards, Bryan Yang(杨易达)* >> >
