Hi Bryan,
Sorry for the late reply.
Thanks for the further research. I agree with the general direction
that FFT fits better as a built-in TVF in the table model.
I have a few additional thoughts on the v1 design:
1. I think we can start with FFT only in the first version.
DFT does not need to be exposed as a separate TVF initially. We can
treat FFT as the practical implementation for frequency-domain
analysis, and add a separate DFT function later only if there is a
clear use case.
2. I think we may not need a VALUE parameter.
For the input table, all numeric columns except the time column and
the PARTITION BY columns can be treated as value columns to transform.
If users only want to transform a subset of columns, they can select
only those columns in the DATA subquery.
For example:
SELECT *
FROM FFT(
DATA => (
SELECT time, device_id, temperature, speed
FROM sensor
) PARTITION BY device_id ORDER BY time
);
Here, temperature and speed would both be transformed.
3. About time and sample interval.
NumPy's fft itself does not take a time column or timestamps as input.
It only takes the value array. The sample interval is only needed when
computing the physical frequency axis, for example through
numpy.fft.fftfreq(n, d=sample_interval).
For IoTDB, since the table model has a time column, we can use the
time column to define the input order and infer the sample interval
when the user does not provide one.
I suggest making SAMPLE_INTERVAL an optional parameter, represented as
a duration literal, such as:
SAMPLE_INTERVAL => 1ms
SAMPLE_INTERVAL => 1s
If SAMPLE_INTERVAL is provided, it should override the interval
inferred from the time column. If it is not provided, the function can
infer it as:
(last_time - first_time) / (row_count - 1)
I do not think we need to validate whether every adjacent timestamp
interval is exactly the same in v1. We can assume the input represents
a uniformly sampled sequence.
However, we should still validate that the time column is ascending
within each partition. If the timestamps are not ascending in a
partition, the function should throw an exception.
4. About SPECTRUM.
SPECTRUM mainly controls whether the function outputs the full FFT
spectrum or only the one-sided spectrum.
For v1, I think we can keep this simple and output the full spectrum,
which is closer to numpy.fft.fft. One-sided output, similar to
numpy.fft.rfft, can be discussed later.
5. About the output schema.
NumPy's fft returns complex values. It does not directly output
amplitude or phase. Amplitude and phase are derived values, for
example abs(result) and angle(result).
So for v1, I suggest the core output schema should include real and
imaginary parts only. For multiple value columns, the output columns
should be prefixed with the original column names, for example:
partition columns...,
frequency_index,
frequency,
temperature_real,
temperature_imag,
speed_real,
speed_imag
Here, frequency_index should mean the FFT output index / frequency bin
index, not just a generated row number. It is useful for preserving
the original FFT output order and aligning with the position in the
FFT result array.
The frequency column is not the same for every row in a partition.
Each output row corresponds to one frequency bin. For the same
partition, multiple transformed value columns share the same
frequency_index and frequency axis.
Amplitude and phase can be added later if we think they are useful
convenience columns, but I would prefer not to include them in the
minimal v1 schema.
Best regards,
-----------------
Yuan Tian
On Wed, Jun 10, 2026 at 2:14 PM Bryan Yang <[email protected]> wrote:
> Hi Yuan Tian,
>
> Thanks for the suggestion.
>
> I did some preliminary research on MATLAB, NumPy/SciPy, Azure Data Explorer
> (Kusto), and the existing IoTDB FFT UDF implementation. My current
> understanding is aligned with your suggestion: FFT/DFT fit IoTDB’s table
> model best as built-in table-valued functions.
>
> They consume a partitioned and time-ordered numeric sequence, and produce
> multiple frequency-domain result rows. Therefore, they are not scalar
> functions, because they do not operate on a single row. They are also
> different from ordinary window functions, because the output rows represent
> frequency bins rather than the original input rows.
>
> A possible first version of FFT could look like this:
>
> SELECT *
> FROM FFT(
> DATA => (
> SELECT time, device_id, value
> FROM sensor
> ) PARTITION BY device_id ORDER BY time,
> VALUE => 'value',
> TIMECOL => 'time',
> SAMPLE_RATE => 1000,
> N => 1024,
> NORM => 'backward',
> SPECTRUM => 'full'
> );
>
>
> If we decide to expose DFT as a separate TVF, it could share the same
> signature and output schema:
>
> SELECT *
> FROM DFT(
> DATA => (...) PARTITION BY device_id ORDER BY time,
> VALUE => 'value',
> TIMECOL => 'time',
> SAMPLE_RATE => 1000,
> N => 1024
> );
>
>
> Suggested input parameters:
>
> DATA: required table argument. It provides the input sequence. PARTITION BY
> can be used to compute one transform per device or tag group, and ORDER BY
> defines the time order.
>
> VALUE: required string. The numeric column to transform. Supported input
> types can be INT32, INT64, FLOAT, and DOUBLE.
>
> TIMECOL: optional string. The timestamp column name, defaulting to time.
>
> SAMPLE_RATE / SAMPLE_INTERVAL: sampling frequency or sampling interval,
> used to compute the physical frequency. I think exactly one of them should
> be provided if we want to output physical frequency. If neither is
> provided, we may need to define whether the function should reject the
> query or output only normalized frequency.
>
> N: optional integer. Transform length. If N is smaller than the input
> length, truncate the input; if larger, zero-pad it. This follows
> MATLAB/SciPy semantics.
>
> NORM: optional string. Normalization mode, such as backward, forward, or
> ortho.
>
> SPECTRUM: optional string. Output frequency range, such as full or
> one_sided.
>
> Suggested output schema:
>
> frequency_index (INT64): frequency bin index.
> frequency (DOUBLE): physical frequency derived from SAMPLE_RATE or
> SAMPLE_INTERVAL.
> real (DOUBLE): real part of the transform result.
> imag (DOUBLE): imaginary part of the transform result.
> amplitude (DOUBLE): magnitude, computed as sqrt(real^2 + imag^2).
> phase (DOUBLE): phase angle in radians, computed as atan2(imag, real).
>
> If the input table is partitioned by device or tags, the partition columns
> should be preserved in the output, so users can identify which
> frequency-domain rows belong to each input series.
>
> For the first version, I suggest keeping the scope simple:
>
> support one real-valued numeric input column;
> require or assume uniformly sampled data;
> support N with truncation and zero-padding semantics;
> use the same output schema for FFT and DFT if both are exposed;
> treat FFT as the practical high-performance implementation;
> discuss whether a separate DFT TVF is necessary in v1, since most libraries
> expose FFT as the efficient implementation of DFT.
>
> There are still a few points worth discussing:
>
> Should SAMPLE_RATE or SAMPLE_INTERVAL be required, or should we allow
> normalized frequency output when they are omitted?
> Should SPECTRUM => 'one_sided' be included in the first version, since most
> IoT sensor data is real-valued?
> Is a separate DFT TVF necessary in the first version, or should we start
> with FFT only and add DFT later if there is a clear use case?
>
> References I checked:
>
> [1] MATLAB fft:
> https://www.mathworks.com/help/matlab/ref/fft.html
>
> [2] NumPy fft:
> https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html
>
> [3] SciPy fft:
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.html
>
> [4] SciPy DFT matrix:
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html
>
> [5] Kusto series_fft:
> https://learn.microsoft.com/en-us/kusto/query/series-fft-function
>
> Best,
> Bryan Yang(杨易达)
>
> Yuan Tian <[email protected]> 于2026年6月8日周一 18:10写道:
>
> > Hi Bryan,
> >
> > Thanks for bringing this up.
> >
> > For questions 1 and 2, I think FFT/DFT can be provided as built-in
> > table-valued functions in the table model. Their semantics naturally fit
> > the TVF abstraction, since a time-ordered sequence is transformed into
> > multiple frequency-domain result rows.
> >
> > For question 3, I think it would be helpful to do some further research
> > before finalizing the parameters and output schema. As far as I know,
> > MATLAB and Python's SciPy both provide similar FFT/DFT functions, and
> their
> > APIs may be useful references. I have not looked into how other databases
> > expose this kind of functionality yet. It may be worth checking both
> these
> > library functions and other databases to see what inputs they require and
> > what outputs they return, then decide what design best fits IoTDB's table
> > model.
> >
> > Best,
> > Yuan
> >
> > On Mon, Jun 8, 2026 at 3:37 PM Bryan Yang <[email protected]> wrote:
> >
> > > Hi IoTDB community,
> > >
> > > I would like to discuss a possible feature for the IoTDB table model:
> > > adding built-in FFT and DFT functions for time-series frequency-domain
> > > analysis.
> > >
> > > FFT stands for Fast Fourier Transform, and DFT stands for Discrete
> > Fourier
> > > Transform. Both are used to transform time-domain data into
> > > frequency-domain data. FFT is essentially a fast algorithm for
> computing
> > > DFT, so I think these two functions can be designed together, sharing
> > > similar parameters, output schema, and test cases.
> > >
> > > For IoTDB, this could be useful for scenarios such as sensor vibration
> > > analysis, dominant frequency detection, and periodic signal analysis.
> > > Preliminary Analysis
> > >
> > > After some preliminary analysis, I think FFT/DFT are more suitable as
> > > table-valued functions (TVFs), rather than scalar functions or window
> > > functions.
> > >
> > > The reason is that FFT/DFT do not work as one-row-in, one-row-out
> scalar
> > > functions like abs(), sin(), or round(). They also do not aggregate
> > > multiple rows into a single value like avg() or sum().
> > >
> > > Instead, their semantics are:
> > >
> > > a time-ordered sequence -> multiple frequency points
> > >
> > > Possible SQL Form
> > >
> > > SELECT *
> > > FROM FFT(
> > > DATA => (
> > > SELECT time, device_id, value
> > > FROM sensor
> > > ) PARTITION BY device_id ORDER BY time,
> > > VALUE => 'value'
> > > );
> > >
> > > This means that the input table is partitioned by device_id, each
> > partition
> > > is ordered by time, and the value column is transformed into
> > > frequency-domain results.
> > >
> > > Similarly, DFT could use the same form:
> > >
> > > SELECT *
> > > FROM DFT(
> > > DATA => (
> > > SELECT time, value
> > > FROM sensor
> > > WHERE device_id = 'd1'
> > > ) ORDER BY time,
> > > VALUE => 'value'
> > > );
> > >
> > > Possible Output Schema
> > >
> > > A possible output schema could be:
> > >
> > > frequency_index, frequency(optional), real, imag, amplitude, phase
> > >
> > > Here, frequency_index, real, and imag are the core results of FFT/DFT.
> > > amplitude and phase can be derived from real/imag and may be useful for
> > > analysis.
> > >
> > > The frequency column would require the user to provide a sample rate or
> > > sample interval; otherwise, only frequency_index can be returned.
> > > Existing Related Work
> > >
> > > I also noticed that IoTDB already has FFT-related UDF support in the
> > > library-udf module. This proposal focuses on whether FFT/DFT should be
> > > provided as built-in functions in the table model, and whether TVF is
> the
> > > right abstraction.
> > > Questions
> > >
> > > I would appreciate your feedback on this direction, especially:
> > >
> > > 1.
> > >
> > > Whether FFT/DFT are suitable as built-in functions in the table
> model.
> > > 2.
> > >
> > > Whether TVF is the right function type for them.
> > > 3.
> > >
> > > What the expected parameters and output schema should be.
> > >
> > > Best regards, Bryan Yang(杨易达)
> > >
> >
>