Hi Yuan Tian,

Thanks for the suggestion.

I did some preliminary research on MATLAB, NumPy/SciPy, Azure Data Explorer
(Kusto), and the existing IoTDB FFT UDF implementation. My current
understanding is aligned with your suggestion: FFT/DFT fit IoTDB’s table
model best as built-in table-valued functions.

They consume a partitioned and time-ordered numeric sequence, and produce
multiple frequency-domain result rows. Therefore, they are not scalar
functions, because they do not operate on a single row. They are also
different from ordinary window functions, because the output rows represent
frequency bins rather than the original input rows.

A possible first version of FFT could look like this:

SELECT *
FROM FFT(
  DATA => (
    SELECT time, device_id, value
    FROM sensor
  ) PARTITION BY device_id ORDER BY time,
  VALUE => 'value',
  TIMECOL => 'time',
  SAMPLE_RATE => 1000,
  N => 1024,
  NORM => 'backward',
  SPECTRUM => 'full'
);


If we decide to expose DFT as a separate TVF, it could share the same
signature and output schema:

SELECT *
FROM DFT(
  DATA => (...) PARTITION BY device_id ORDER BY time,
  VALUE => 'value',
  TIMECOL => 'time',
  SAMPLE_RATE => 1000,
  N => 1024
);


Suggested input parameters:

DATA: required table argument. It provides the input sequence. PARTITION BY
can be used to compute one transform per device or tag group, and ORDER BY
defines the time order.

VALUE: required string. The numeric column to transform. Supported input
types can be INT32, INT64, FLOAT, and DOUBLE.

TIMECOL: optional string. The timestamp column name, defaulting to time.

SAMPLE_RATE / SAMPLE_INTERVAL: sampling frequency or sampling interval,
used to compute the physical frequency. I think exactly one of them should
be provided if we want to output physical frequency. If neither is
provided, we may need to define whether the function should reject the
query or output only normalized frequency.

N: optional integer. Transform length. If N is smaller than the input
length, truncate the input; if larger, zero-pad it. This follows
MATLAB/SciPy semantics.

NORM: optional string. Normalization mode, such as backward, forward, or
ortho.

SPECTRUM: optional string. Output frequency range, such as full or
one_sided.

Suggested output schema:

frequency_index (INT64): frequency bin index.
frequency (DOUBLE): physical frequency derived from SAMPLE_RATE or
SAMPLE_INTERVAL.
real (DOUBLE): real part of the transform result.
imag (DOUBLE): imaginary part of the transform result.
amplitude (DOUBLE): magnitude, computed as sqrt(real^2 + imag^2).
phase (DOUBLE): phase angle in radians, computed as atan2(imag, real).

If the input table is partitioned by device or tags, the partition columns
should be preserved in the output, so users can identify which
frequency-domain rows belong to each input series.

For the first version, I suggest keeping the scope simple:

support one real-valued numeric input column;
require or assume uniformly sampled data;
support N with truncation and zero-padding semantics;
use the same output schema for FFT and DFT if both are exposed;
treat FFT as the practical high-performance implementation;
discuss whether a separate DFT TVF is necessary in v1, since most libraries
expose FFT as the efficient implementation of DFT.

There are still a few points worth discussing:

Should SAMPLE_RATE or SAMPLE_INTERVAL be required, or should we allow
normalized frequency output when they are omitted?
Should SPECTRUM => 'one_sided' be included in the first version, since most
IoT sensor data is real-valued?
Is a separate DFT TVF necessary in the first version, or should we start
with FFT only and add DFT later if there is a clear use case?

References I checked:

[1] MATLAB fft:
https://www.mathworks.com/help/matlab/ref/fft.html

[2] NumPy fft:
https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html

[3] SciPy fft:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.html

[4] SciPy DFT matrix:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html

[5] Kusto series_fft:
https://learn.microsoft.com/en-us/kusto/query/series-fft-function

Best,
Bryan Yang(杨易达)

Yuan Tian <[email protected]> 于2026年6月8日周一 18:10写道:

> Hi Bryan,
>
> Thanks for bringing this up.
>
> For questions 1 and 2, I think FFT/DFT can be provided as built-in
> table-valued functions in the table model. Their semantics naturally fit
> the TVF abstraction, since a time-ordered sequence is transformed into
> multiple frequency-domain result rows.
>
> For question 3, I think it would be helpful to do some further research
> before finalizing the parameters and output schema. As far as I know,
> MATLAB and Python's SciPy both provide similar FFT/DFT functions, and their
> APIs may be useful references. I have not looked into how other databases
> expose this kind of functionality yet. It may be worth checking both these
> library functions and other databases to see what inputs they require and
> what outputs they return, then decide what design best fits IoTDB's table
> model.
>
> Best,
> Yuan
>
> On Mon, Jun 8, 2026 at 3:37 PM Bryan Yang <[email protected]> wrote:
>
> > Hi IoTDB community,
> >
> > I would like to discuss a possible feature for the IoTDB table model:
> > adding built-in FFT and DFT functions for time-series frequency-domain
> > analysis.
> >
> > FFT stands for Fast Fourier Transform, and DFT stands for Discrete
> Fourier
> > Transform. Both are used to transform time-domain data into
> > frequency-domain data. FFT is essentially a fast algorithm for computing
> > DFT, so I think these two functions can be designed together, sharing
> > similar parameters, output schema, and test cases.
> >
> > For IoTDB, this could be useful for scenarios such as sensor vibration
> > analysis, dominant frequency detection, and periodic signal analysis.
> > Preliminary Analysis
> >
> > After some preliminary analysis, I think FFT/DFT are more suitable as
> > table-valued functions (TVFs), rather than scalar functions or window
> > functions.
> >
> > The reason is that FFT/DFT do not work as one-row-in, one-row-out scalar
> > functions like abs(), sin(), or round(). They also do not aggregate
> > multiple rows into a single value like avg() or sum().
> >
> > Instead, their semantics are:
> >
> > a time-ordered sequence -> multiple frequency points
> >
> > Possible SQL Form
> >
> > SELECT *
> > FROM FFT(
> >   DATA => (
> >     SELECT time, device_id, value
> >     FROM sensor
> >   ) PARTITION BY device_id ORDER BY time,
> >   VALUE => 'value'
> > );
> >
> > This means that the input table is partitioned by device_id, each
> partition
> > is ordered by time, and the value column is transformed into
> > frequency-domain results.
> >
> > Similarly, DFT could use the same form:
> >
> > SELECT *
> > FROM DFT(
> >   DATA => (
> >     SELECT time, value
> >     FROM sensor
> >     WHERE device_id = 'd1'
> >   ) ORDER BY time,
> >   VALUE => 'value'
> > );
> >
> > Possible Output Schema
> >
> > A possible output schema could be:
> >
> > frequency_index, frequency(optional), real, imag, amplitude, phase
> >
> > Here, frequency_index, real, and imag are the core results of FFT/DFT.
> > amplitude and phase can be derived from real/imag and may be useful for
> > analysis.
> >
> > The frequency column would require the user to provide a sample rate or
> > sample interval; otherwise, only frequency_index can be returned.
> > Existing Related Work
> >
> > I also noticed that IoTDB already has FFT-related UDF support in the
> > library-udf module. This proposal focuses on whether FFT/DFT should be
> > provided as built-in functions in the table model, and whether TVF is the
> > right abstraction.
> > Questions
> >
> > I would appreciate your feedback on this direction, especially:
> >
> >    1.
> >
> >    Whether FFT/DFT are suitable as built-in functions in the table model.
> >    2.
> >
> >    Whether TVF is the right function type for them.
> >    3.
> >
> >    What the expected parameters and output schema should be.
> >
> > Best regards, Bryan Yang(杨易达)
> >
>

Reply via email to