*Hi Yuan Tian,* Thanks for the clarification. I agree that keeping N and NORM in the v1 signature makes sense, since they are standard FFT parameters and their semantics are clear.
I will update the v1 design accordingly. N will be an optional integer transform length. If N is not provided, the transform length defaults to the input length of each partition. If N is smaller than the input length, the input sequence will be truncated. If N is larger than the input length, the input sequence will be zero-padded. The number of output frequency bins will be N. *NORM* will be an optional *string* *parameter*. The supported values will be backward, forward, and ortho. The default value will be backward, aligned with numpy.fft.fft. The frequency axis will be calculated based on the transform length N and SAMPLE_INTERVAL. If SAMPLE_INTERVAL is provided, it is used directly. Otherwise, it is inferred from the time column within each partition as: *(last_time - first_time) / (row_count - 1)* The frequency unit will be Hz, i.e., cycles per second. The rest of the v1 design remains unchanged: 1. expose FFT only, not DFT; 2. no VALUE parameter; 3. transform all numeric columns except the time column and PARTITION BY columns; 4. require time to be strictly ascending within each partition; 5. assume uniformly sampled input in v1 without checking every adjacent interval; 6. output full spectrum only; 7. keep the minimal output schema with frequency_index, frequency, and real/imag columns; 8. leave amplitude, phase, and one-sided spectrum for possible future extensions. *I will also make one detail explicit in the design doc: when N is provided, the number of output rows per partition is N, and frequency_index ranges from 0 to N - 1. When N is not provided, N defaults to the input length of that partition.* *Best,Bryan Yang(杨易达)* Yuan Tian <[email protected]> 于2026年6月16日周二 15:39写道: > Hi Bryan, > > Thanks for summarizing the revised v1 design. I agree with most of the > points. The v1 scope is much clearer now. > > One adjustment I would suggest is to keep N and NORM in the v1 signature. > They are standard parameters in NumPy/SciPy FFT APIs, and their semantics > are relatively straightforward. > > For N: > > N can be an optional integer parameter. If it is not provided, the > transform length defaults to the input length of each partition. If N is > smaller than the input length, the input is truncated. If N is larger than > the input length, the input is zero-padded. The number of output frequency > bins should be N. > > For NORM: > > NORM can also be optional, with the same supported values as NumPy: > > backward > forward > ortho > > The default can be backward, which is also the default behavior of > numpy.fft.fft. > > So the v1 parameters could be: > > DATA: required table argument. > SAMPLE_INTERVAL: optional duration literal, such as 1ms or 1s. > N: optional integer transform length. > NORM: optional string, one of backward, forward, or ortho. > > Other parts of the design still look good to me: > > 1. Start with FFT only. DFT can be added later only if there is a concrete > use case. > > 2. Do not require a VALUE parameter. All numeric columns except the time > column and PARTITION BY columns can be transformed. Users can control the > selected value columns through the DATA subquery. > > 3. SAMPLE_INTERVAL should be optional. If provided, it overrides the > interval inferred from the time column. If it is not provided, the interval > can be inferred within each partition as: > > (last_time - first_time) / (row_count - 1) > > For v1, I think it is acceptable to assume uniformly sampled input and not > validate every adjacent timestamp interval. > > 4. We should still require the time column to be strictly ascending within > each partition. If timestamps are not ascending, or if there are duplicated > timestamps, the function should throw an exception. > > 5. If a partition has fewer than two rows and SAMPLE_INTERVAL is not > provided, the function should reject the query because it cannot infer the > interval. If SAMPLE_INTERVAL is provided, this case can still be handled. > > 6. For v1, outputting the full spectrum only is fine. One-sided output can > be considered later. > > 7. The minimal output schema can remain: > > partition columns..., > frequency_index, > frequency, > temperature_real, > temperature_imag, > speed_real, > speed_imag > > The frequency column should be calculated from SAMPLE_INTERVAL and N, and I > think its unit should be Hz, i.e., cycles per second. > > Amplitude and phase can be added later as convenience columns if users need > them, but I agree they do not need to be part of the minimal v1 schema. > > With N and NORM included, a possible v1 syntax could be: > > SELECT * > FROM FFT( > DATA => ( > SELECT time, device_id, temperature, speed > FROM sensor > ) PARTITION BY device_id ORDER BY time, > SAMPLE_INTERVAL => 1ms, > N => 1024, > NORM => 'backward' > ); > > Best regards, > -------------------- > Yuan Tian > > On Tue, Jun 16, 2026 at 2:09 PM Bryan Yang <[email protected]> wrote: > > > *Hi Yuan Tian,* > > > > Thanks for the detailed feedback. I agree with your suggestions, and I > > think they make the v1 scope cleaner. > > > > Based on your comments, I would revise the v1 FFT design as follows: > > > > 1. Start with FFT only. > > > > We do not need to expose DFT as a separate TVF in the first version. FFT > > can be treated as the practical implementation for frequency-domain > > analysis, and DFT can be discussed later if there is a concrete use case. > > > > 2. Remove the VALUE parameter. > > > > The FFT TVF can transform all numeric columns from the input table, > > excluding the time column and PARTITION BY columns. If users only want to > > transform a subset of columns, they can project only those columns in the > > DATA subquery. > > > > For example: > > > > SELECT * > > FROM FFT( > > DATA => ( > > SELECT time, device_id, temperature, speed > > FROM sensor > > ) PARTITION BY device_id ORDER BY time > > ); > > > > This would produce FFT results for both temperature and speed. > > > > 3. Use the time column for ordering and sample interval inference. > > > > SAMPLE_INTERVAL can be optional and represented as a duration literal, > for > > example: > > > > SAMPLE_INTERVAL => 1ms > > SAMPLE_INTERVAL => 1s > > > > If SAMPLE_INTERVAL is provided, it overrides the inferred interval. > > Otherwise, the interval can be inferred within each partition as: > > > > (last_time - first_time) / (row_count - 1) > > > > For v1, we can assume the input is uniformly sampled and only validate > that > > timestamps are ascending within each partition. > > > > One small edge case is when a partition has fewer than two rows. In that > > case, if SAMPLE_INTERVAL is not provided, the function cannot infer the > > interval. I think we should either reject that partition/query or require > > SAMPLE_INTERVAL for such cases. > > > > 4. Output the full spectrum only in v1. > > > > This keeps the behavior close to numpy.fft.fft. One-sided output can be > > considered later, possibly as a separate option or function. > > > > 5. Keep the output schema minimal. > > > > For v1, the output schema can include partition columns, frequency_index, > > frequency, and real/imaginary output columns for each transformed value > > column: > > > > partition columns..., > > frequency_index, > > frequency, > > temperature_real, > > temperature_imag, > > speed_real, > > speed_imag > > > > Amplitude and phase can be added later as convenience columns if users > need > > them. > > > > 6. Keep N and NORM out of the v1 signature. > > > > For the first version, I also think we can avoid exposing N and NORM. > > > > The transform length can default to the input length of each partition. > The > > normalization behavior can follow the default behavior of numpy.fft.fft. > > This keeps the first version small and avoids adding options before we > have > > concrete user requirements. > > > > So a possible minimal v1 syntax would be: > > > > SELECT * > > FROM FFT( > > DATA => ( > > SELECT time, device_id, temperature, speed > > FROM sensor > > ) PARTITION BY device_id ORDER BY time, > > SAMPLE_INTERVAL => 1ms > > ); > > > > or, with inferred sample interval: > > > > SELECT * > > FROM FFT( > > DATA => ( > > SELECT time, device_id, temperature, speed > > FROM sensor > > ) PARTITION BY device_id ORDER BY time > > ); > > > > > > *Best,Bryan Yang* > > > > > > Yuan Tian <[email protected]> 于2026年6月16日周二 10:19写道: > > > > > Hi Bryan, > > > > > > Sorry for the late reply. > > > > > > > > > Thanks for the further research. I agree with the general direction > > > that FFT fits better as a built-in TVF in the table model. > > > > > > I have a few additional thoughts on the v1 design: > > > > > > 1. I think we can start with FFT only in the first version. > > > > > > DFT does not need to be exposed as a separate TVF initially. We can > > > treat FFT as the practical implementation for frequency-domain > > > analysis, and add a separate DFT function later only if there is a > > > clear use case. > > > > > > 2. I think we may not need a VALUE parameter. > > > > > > For the input table, all numeric columns except the time column and > > > the PARTITION BY columns can be treated as value columns to transform. > > > If users only want to transform a subset of columns, they can select > > > only those columns in the DATA subquery. > > > > > > For example: > > > > > > SELECT * > > > FROM FFT( > > > DATA => ( > > > SELECT time, device_id, temperature, speed > > > FROM sensor > > > ) PARTITION BY device_id ORDER BY time > > > ); > > > > > > Here, temperature and speed would both be transformed. > > > > > > 3. About time and sample interval. > > > > > > NumPy's fft itself does not take a time column or timestamps as input. > > > It only takes the value array. The sample interval is only needed when > > > computing the physical frequency axis, for example through > > > numpy.fft.fftfreq(n, d=sample_interval). > > > > > > For IoTDB, since the table model has a time column, we can use the > > > time column to define the input order and infer the sample interval > > > when the user does not provide one. > > > > > > I suggest making SAMPLE_INTERVAL an optional parameter, represented as > > > a duration literal, such as: > > > > > > SAMPLE_INTERVAL => 1ms > > > SAMPLE_INTERVAL => 1s > > > > > > If SAMPLE_INTERVAL is provided, it should override the interval > > > inferred from the time column. If it is not provided, the function can > > > infer it as: > > > > > > (last_time - first_time) / (row_count - 1) > > > > > > I do not think we need to validate whether every adjacent timestamp > > > interval is exactly the same in v1. We can assume the input represents > > > a uniformly sampled sequence. > > > > > > However, we should still validate that the time column is ascending > > > within each partition. If the timestamps are not ascending in a > > > partition, the function should throw an exception. > > > > > > 4. About SPECTRUM. > > > > > > SPECTRUM mainly controls whether the function outputs the full FFT > > > spectrum or only the one-sided spectrum. > > > > > > For v1, I think we can keep this simple and output the full spectrum, > > > which is closer to numpy.fft.fft. One-sided output, similar to > > > numpy.fft.rfft, can be discussed later. > > > > > > 5. About the output schema. > > > > > > NumPy's fft returns complex values. It does not directly output > > > amplitude or phase. Amplitude and phase are derived values, for > > > example abs(result) and angle(result). > > > > > > So for v1, I suggest the core output schema should include real and > > > imaginary parts only. For multiple value columns, the output columns > > > should be prefixed with the original column names, for example: > > > > > > partition columns..., > > > frequency_index, > > > frequency, > > > temperature_real, > > > temperature_imag, > > > speed_real, > > > speed_imag > > > > > > Here, frequency_index should mean the FFT output index / frequency bin > > > index, not just a generated row number. It is useful for preserving > > > the original FFT output order and aligning with the position in the > > > FFT result array. > > > > > > The frequency column is not the same for every row in a partition. > > > Each output row corresponds to one frequency bin. For the same > > > partition, multiple transformed value columns share the same > > > frequency_index and frequency axis. > > > > > > Amplitude and phase can be added later if we think they are useful > > > convenience columns, but I would prefer not to include them in the > > > minimal v1 schema. > > > > > > Best regards, > > > ----------------- > > > Yuan Tian > > > > > > > > > On Wed, Jun 10, 2026 at 2:14 PM Bryan Yang <[email protected]> > wrote: > > > > > > > Hi Yuan Tian, > > > > > > > > Thanks for the suggestion. > > > > > > > > I did some preliminary research on MATLAB, NumPy/SciPy, Azure Data > > > Explorer > > > > (Kusto), and the existing IoTDB FFT UDF implementation. My current > > > > understanding is aligned with your suggestion: FFT/DFT fit IoTDB’s > > table > > > > model best as built-in table-valued functions. > > > > > > > > They consume a partitioned and time-ordered numeric sequence, and > > produce > > > > multiple frequency-domain result rows. Therefore, they are not scalar > > > > functions, because they do not operate on a single row. They are also > > > > different from ordinary window functions, because the output rows > > > represent > > > > frequency bins rather than the original input rows. > > > > > > > > A possible first version of FFT could look like this: > > > > > > > > SELECT * > > > > FROM FFT( > > > > DATA => ( > > > > SELECT time, device_id, value > > > > FROM sensor > > > > ) PARTITION BY device_id ORDER BY time, > > > > VALUE => 'value', > > > > TIMECOL => 'time', > > > > SAMPLE_RATE => 1000, > > > > N => 1024, > > > > NORM => 'backward', > > > > SPECTRUM => 'full' > > > > ); > > > > > > > > > > > > If we decide to expose DFT as a separate TVF, it could share the same > > > > signature and output schema: > > > > > > > > SELECT * > > > > FROM DFT( > > > > DATA => (...) PARTITION BY device_id ORDER BY time, > > > > VALUE => 'value', > > > > TIMECOL => 'time', > > > > SAMPLE_RATE => 1000, > > > > N => 1024 > > > > ); > > > > > > > > > > > > Suggested input parameters: > > > > > > > > DATA: required table argument. It provides the input sequence. > > PARTITION > > > BY > > > > can be used to compute one transform per device or tag group, and > ORDER > > > BY > > > > defines the time order. > > > > > > > > VALUE: required string. The numeric column to transform. Supported > > input > > > > types can be INT32, INT64, FLOAT, and DOUBLE. > > > > > > > > TIMECOL: optional string. The timestamp column name, defaulting to > > time. > > > > > > > > SAMPLE_RATE / SAMPLE_INTERVAL: sampling frequency or sampling > interval, > > > > used to compute the physical frequency. I think exactly one of them > > > should > > > > be provided if we want to output physical frequency. If neither is > > > > provided, we may need to define whether the function should reject > the > > > > query or output only normalized frequency. > > > > > > > > N: optional integer. Transform length. If N is smaller than the input > > > > length, truncate the input; if larger, zero-pad it. This follows > > > > MATLAB/SciPy semantics. > > > > > > > > NORM: optional string. Normalization mode, such as backward, forward, > > or > > > > ortho. > > > > > > > > SPECTRUM: optional string. Output frequency range, such as full or > > > > one_sided. > > > > > > > > Suggested output schema: > > > > > > > > frequency_index (INT64): frequency bin index. > > > > frequency (DOUBLE): physical frequency derived from SAMPLE_RATE or > > > > SAMPLE_INTERVAL. > > > > real (DOUBLE): real part of the transform result. > > > > imag (DOUBLE): imaginary part of the transform result. > > > > amplitude (DOUBLE): magnitude, computed as sqrt(real^2 + imag^2). > > > > phase (DOUBLE): phase angle in radians, computed as atan2(imag, > real). > > > > > > > > If the input table is partitioned by device or tags, the partition > > > columns > > > > should be preserved in the output, so users can identify which > > > > frequency-domain rows belong to each input series. > > > > > > > > For the first version, I suggest keeping the scope simple: > > > > > > > > support one real-valued numeric input column; > > > > require or assume uniformly sampled data; > > > > support N with truncation and zero-padding semantics; > > > > use the same output schema for FFT and DFT if both are exposed; > > > > treat FFT as the practical high-performance implementation; > > > > discuss whether a separate DFT TVF is necessary in v1, since most > > > libraries > > > > expose FFT as the efficient implementation of DFT. > > > > > > > > There are still a few points worth discussing: > > > > > > > > Should SAMPLE_RATE or SAMPLE_INTERVAL be required, or should we allow > > > > normalized frequency output when they are omitted? > > > > Should SPECTRUM => 'one_sided' be included in the first version, > since > > > most > > > > IoT sensor data is real-valued? > > > > Is a separate DFT TVF necessary in the first version, or should we > > start > > > > with FFT only and add DFT later if there is a clear use case? > > > > > > > > References I checked: > > > > > > > > [1] MATLAB fft: > > > > https://www.mathworks.com/help/matlab/ref/fft.html > > > > > > > > [2] NumPy fft: > > > > https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html > > > > > > > > [3] SciPy fft: > > > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.html > > > > > > > > [4] SciPy DFT matrix: > > > > > > > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html > > > > > > > > [5] Kusto series_fft: > > > > https://learn.microsoft.com/en-us/kusto/query/series-fft-function > > > > > > > > Best, > > > > Bryan Yang(杨易达) > > > > > > > > Yuan Tian <[email protected]> 于2026年6月8日周一 18:10写道: > > > > > > > > > Hi Bryan, > > > > > > > > > > Thanks for bringing this up. > > > > > > > > > > For questions 1 and 2, I think FFT/DFT can be provided as built-in > > > > > table-valued functions in the table model. Their semantics > naturally > > > fit > > > > > the TVF abstraction, since a time-ordered sequence is transformed > > into > > > > > multiple frequency-domain result rows. > > > > > > > > > > For question 3, I think it would be helpful to do some further > > research > > > > > before finalizing the parameters and output schema. As far as I > know, > > > > > MATLAB and Python's SciPy both provide similar FFT/DFT functions, > and > > > > their > > > > > APIs may be useful references. I have not looked into how other > > > databases > > > > > expose this kind of functionality yet. It may be worth checking > both > > > > these > > > > > library functions and other databases to see what inputs they > require > > > and > > > > > what outputs they return, then decide what design best fits IoTDB's > > > table > > > > > model. > > > > > > > > > > Best, > > > > > Yuan > > > > > > > > > > On Mon, Jun 8, 2026 at 3:37 PM Bryan Yang <[email protected]> > > wrote: > > > > > > > > > > > Hi IoTDB community, > > > > > > > > > > > > I would like to discuss a possible feature for the IoTDB table > > model: > > > > > > adding built-in FFT and DFT functions for time-series > > > frequency-domain > > > > > > analysis. > > > > > > > > > > > > FFT stands for Fast Fourier Transform, and DFT stands for > Discrete > > > > > Fourier > > > > > > Transform. Both are used to transform time-domain data into > > > > > > frequency-domain data. FFT is essentially a fast algorithm for > > > > computing > > > > > > DFT, so I think these two functions can be designed together, > > sharing > > > > > > similar parameters, output schema, and test cases. > > > > > > > > > > > > For IoTDB, this could be useful for scenarios such as sensor > > > vibration > > > > > > analysis, dominant frequency detection, and periodic signal > > analysis. > > > > > > Preliminary Analysis > > > > > > > > > > > > After some preliminary analysis, I think FFT/DFT are more > suitable > > as > > > > > > table-valued functions (TVFs), rather than scalar functions or > > window > > > > > > functions. > > > > > > > > > > > > The reason is that FFT/DFT do not work as one-row-in, one-row-out > > > > scalar > > > > > > functions like abs(), sin(), or round(). They also do not > aggregate > > > > > > multiple rows into a single value like avg() or sum(). > > > > > > > > > > > > Instead, their semantics are: > > > > > > > > > > > > a time-ordered sequence -> multiple frequency points > > > > > > > > > > > > Possible SQL Form > > > > > > > > > > > > SELECT * > > > > > > FROM FFT( > > > > > > DATA => ( > > > > > > SELECT time, device_id, value > > > > > > FROM sensor > > > > > > ) PARTITION BY device_id ORDER BY time, > > > > > > VALUE => 'value' > > > > > > ); > > > > > > > > > > > > This means that the input table is partitioned by device_id, each > > > > > partition > > > > > > is ordered by time, and the value column is transformed into > > > > > > frequency-domain results. > > > > > > > > > > > > Similarly, DFT could use the same form: > > > > > > > > > > > > SELECT * > > > > > > FROM DFT( > > > > > > DATA => ( > > > > > > SELECT time, value > > > > > > FROM sensor > > > > > > WHERE device_id = 'd1' > > > > > > ) ORDER BY time, > > > > > > VALUE => 'value' > > > > > > ); > > > > > > > > > > > > Possible Output Schema > > > > > > > > > > > > A possible output schema could be: > > > > > > > > > > > > frequency_index, frequency(optional), real, imag, amplitude, > phase > > > > > > > > > > > > Here, frequency_index, real, and imag are the core results of > > > FFT/DFT. > > > > > > amplitude and phase can be derived from real/imag and may be > useful > > > for > > > > > > analysis. > > > > > > > > > > > > The frequency column would require the user to provide a sample > > rate > > > or > > > > > > sample interval; otherwise, only frequency_index can be returned. > > > > > > Existing Related Work > > > > > > > > > > > > I also noticed that IoTDB already has FFT-related UDF support in > > the > > > > > > library-udf module. This proposal focuses on whether FFT/DFT > should > > > be > > > > > > provided as built-in functions in the table model, and whether > TVF > > is > > > > the > > > > > > right abstraction. > > > > > > Questions > > > > > > > > > > > > I would appreciate your feedback on this direction, especially: > > > > > > > > > > > > 1. > > > > > > > > > > > > Whether FFT/DFT are suitable as built-in functions in the > table > > > > model. > > > > > > 2. > > > > > > > > > > > > Whether TVF is the right function type for them. > > > > > > 3. > > > > > > > > > > > > What the expected parameters and output schema should be. > > > > > > > > > > > > Best regards, Bryan Yang(杨易达) > > > > > > > > > > > > > > > > > > > > >
