Re: Built-in FFT/DFT Functions for IoTDB Table Model

Bryan Yang Mon, 15 Jun 2026 23:10:34 -0700

*Hi Yuan Tian,*

Thanks for the detailed feedback. I agree with your suggestions, and I
think they make the v1 scope cleaner.


Based on your comments, I would revise the v1 FFT design as follows:

1. Start with FFT only.

We do not need to expose DFT as a separate TVF in the first version. FFT
can be treated as the practical implementation for frequency-domain
analysis, and DFT can be discussed later if there is a concrete use case.

2. Remove the VALUE parameter.

The FFT TVF can transform all numeric columns from the input table,
excluding the time column and PARTITION BY columns. If users only want to
transform a subset of columns, they can project only those columns in the
DATA subquery.

For example:

SELECT *
FROM FFT(
  DATA => (
    SELECT time, device_id, temperature, speed
    FROM sensor
  ) PARTITION BY device_id ORDER BY time
);

This would produce FFT results for both temperature and speed.

3. Use the time column for ordering and sample interval inference.

SAMPLE_INTERVAL can be optional and represented as a duration literal, for
example:

SAMPLE_INTERVAL => 1ms
SAMPLE_INTERVAL => 1s

If SAMPLE_INTERVAL is provided, it overrides the inferred interval.
Otherwise, the interval can be inferred within each partition as:

(last_time - first_time) / (row_count - 1)

For v1, we can assume the input is uniformly sampled and only validate that
timestamps are ascending within each partition.

One small edge case is when a partition has fewer than two rows. In that
case, if SAMPLE_INTERVAL is not provided, the function cannot infer the
interval. I think we should either reject that partition/query or require
SAMPLE_INTERVAL for such cases.

4. Output the full spectrum only in v1.

This keeps the behavior close to numpy.fft.fft. One-sided output can be
considered later, possibly as a separate option or function.

5. Keep the output schema minimal.

For v1, the output schema can include partition columns, frequency_index,
frequency, and real/imaginary output columns for each transformed value
column:

partition columns...,
frequency_index,
frequency,
temperature_real,
temperature_imag,
speed_real,
speed_imag

Amplitude and phase can be added later as convenience columns if users need
them.

6. Keep N and NORM out of the v1 signature.

For the first version, I also think we can avoid exposing N and NORM.

The transform length can default to the input length of each partition. The
normalization behavior can follow the default behavior of numpy.fft.fft.
This keeps the first version small and avoids adding options before we have
concrete user requirements.

So a possible minimal v1 syntax would be:

SELECT *
FROM FFT(
  DATA => (
    SELECT time, device_id, temperature, speed
    FROM sensor
  ) PARTITION BY device_id ORDER BY time,
  SAMPLE_INTERVAL => 1ms
);

or, with inferred sample interval:

SELECT *
FROM FFT(
  DATA => (
    SELECT time, device_id, temperature, speed
    FROM sensor
  ) PARTITION BY device_id ORDER BY time
);


*Best,Bryan Yang*


Yuan Tian <[email protected]> 于2026年6月16日周二 10:19写道：

> Hi Bryan,
>
> Sorry for the late reply.
>
>
> Thanks for the further research. I agree with the general direction
> that FFT fits better as a built-in TVF in the table model.
>
> I have a few additional thoughts on the v1 design:
>
> 1. I think we can start with FFT only in the first version.
>
> DFT does not need to be exposed as a separate TVF initially. We can
> treat FFT as the practical implementation for frequency-domain
> analysis, and add a separate DFT function later only if there is a
> clear use case.
>
> 2. I think we may not need a VALUE parameter.
>
> For the input table, all numeric columns except the time column and
> the PARTITION BY columns can be treated as value columns to transform.
> If users only want to transform a subset of columns, they can select
> only those columns in the DATA subquery.
>
> For example:
>
> SELECT *
> FROM FFT(
>   DATA => (
>     SELECT time, device_id, temperature, speed
>     FROM sensor
>   ) PARTITION BY device_id ORDER BY time
> );
>
> Here, temperature and speed would both be transformed.
>
> 3. About time and sample interval.
>
> NumPy's fft itself does not take a time column or timestamps as input.
> It only takes the value array. The sample interval is only needed when
> computing the physical frequency axis, for example through
> numpy.fft.fftfreq(n, d=sample_interval).
>
> For IoTDB, since the table model has a time column, we can use the
> time column to define the input order and infer the sample interval
> when the user does not provide one.
>
> I suggest making SAMPLE_INTERVAL an optional parameter, represented as
> a duration literal, such as:
>
> SAMPLE_INTERVAL => 1ms
> SAMPLE_INTERVAL => 1s
>
> If SAMPLE_INTERVAL is provided, it should override the interval
> inferred from the time column. If it is not provided, the function can
> infer it as:
>
> (last_time - first_time) / (row_count - 1)
>
> I do not think we need to validate whether every adjacent timestamp
> interval is exactly the same in v1. We can assume the input represents
> a uniformly sampled sequence.
>
> However, we should still validate that the time column is ascending
> within each partition. If the timestamps are not ascending in a
> partition, the function should throw an exception.
>
> 4. About SPECTRUM.
>
> SPECTRUM mainly controls whether the function outputs the full FFT
> spectrum or only the one-sided spectrum.
>
> For v1, I think we can keep this simple and output the full spectrum,
> which is closer to numpy.fft.fft. One-sided output, similar to
> numpy.fft.rfft, can be discussed later.
>
> 5. About the output schema.
>
> NumPy's fft returns complex values. It does not directly output
> amplitude or phase. Amplitude and phase are derived values, for
> example abs(result) and angle(result).
>
> So for v1, I suggest the core output schema should include real and
> imaginary parts only. For multiple value columns, the output columns
> should be prefixed with the original column names, for example:
>
> partition columns...,
> frequency_index,
> frequency,
> temperature_real,
> temperature_imag,
> speed_real,
> speed_imag
>
> Here, frequency_index should mean the FFT output index / frequency bin
> index, not just a generated row number. It is useful for preserving
> the original FFT output order and aligning with the position in the
> FFT result array.
>
> The frequency column is not the same for every row in a partition.
> Each output row corresponds to one frequency bin. For the same
> partition, multiple transformed value columns share the same
> frequency_index and frequency axis.
>
> Amplitude and phase can be added later if we think they are useful
> convenience columns, but I would prefer not to include them in the
> minimal v1 schema.
>
> Best regards,
> -----------------
> Yuan Tian
>
>
> On Wed, Jun 10, 2026 at 2:14 PM Bryan Yang <[email protected]> wrote:
>
> > Hi Yuan Tian,
> >
> > Thanks for the suggestion.
> >
> > I did some preliminary research on MATLAB, NumPy/SciPy, Azure Data
> Explorer
> > (Kusto), and the existing IoTDB FFT UDF implementation. My current
> > understanding is aligned with your suggestion: FFT/DFT fit IoTDB’s table
> > model best as built-in table-valued functions.
> >
> > They consume a partitioned and time-ordered numeric sequence, and produce
> > multiple frequency-domain result rows. Therefore, they are not scalar
> > functions, because they do not operate on a single row. They are also
> > different from ordinary window functions, because the output rows
> represent
> > frequency bins rather than the original input rows.
> >
> > A possible first version of FFT could look like this:
> >
> > SELECT *
> > FROM FFT(
> >   DATA => (
> >     SELECT time, device_id, value
> >     FROM sensor
> >   ) PARTITION BY device_id ORDER BY time,
> >   VALUE => 'value',
> >   TIMECOL => 'time',
> >   SAMPLE_RATE => 1000,
> >   N => 1024,
> >   NORM => 'backward',
> >   SPECTRUM => 'full'
> > );
> >
> >
> > If we decide to expose DFT as a separate TVF, it could share the same
> > signature and output schema:
> >
> > SELECT *
> > FROM DFT(
> >   DATA => (...) PARTITION BY device_id ORDER BY time,
> >   VALUE => 'value',
> >   TIMECOL => 'time',
> >   SAMPLE_RATE => 1000,
> >   N => 1024
> > );
> >
> >
> > Suggested input parameters:
> >
> > DATA: required table argument. It provides the input sequence. PARTITION
> BY
> > can be used to compute one transform per device or tag group, and ORDER
> BY
> > defines the time order.
> >
> > VALUE: required string. The numeric column to transform. Supported input
> > types can be INT32, INT64, FLOAT, and DOUBLE.
> >
> > TIMECOL: optional string. The timestamp column name, defaulting to time.
> >
> > SAMPLE_RATE / SAMPLE_INTERVAL: sampling frequency or sampling interval,
> > used to compute the physical frequency. I think exactly one of them
> should
> > be provided if we want to output physical frequency. If neither is
> > provided, we may need to define whether the function should reject the
> > query or output only normalized frequency.
> >
> > N: optional integer. Transform length. If N is smaller than the input
> > length, truncate the input; if larger, zero-pad it. This follows
> > MATLAB/SciPy semantics.
> >
> > NORM: optional string. Normalization mode, such as backward, forward, or
> > ortho.
> >
> > SPECTRUM: optional string. Output frequency range, such as full or
> > one_sided.
> >
> > Suggested output schema:
> >
> > frequency_index (INT64): frequency bin index.
> > frequency (DOUBLE): physical frequency derived from SAMPLE_RATE or
> > SAMPLE_INTERVAL.
> > real (DOUBLE): real part of the transform result.
> > imag (DOUBLE): imaginary part of the transform result.
> > amplitude (DOUBLE): magnitude, computed as sqrt(real^2 + imag^2).
> > phase (DOUBLE): phase angle in radians, computed as atan2(imag, real).
> >
> > If the input table is partitioned by device or tags, the partition
> columns
> > should be preserved in the output, so users can identify which
> > frequency-domain rows belong to each input series.
> >
> > For the first version, I suggest keeping the scope simple:
> >
> > support one real-valued numeric input column;
> > require or assume uniformly sampled data;
> > support N with truncation and zero-padding semantics;
> > use the same output schema for FFT and DFT if both are exposed;
> > treat FFT as the practical high-performance implementation;
> > discuss whether a separate DFT TVF is necessary in v1, since most
> libraries
> > expose FFT as the efficient implementation of DFT.
> >
> > There are still a few points worth discussing:
> >
> > Should SAMPLE_RATE or SAMPLE_INTERVAL be required, or should we allow
> > normalized frequency output when they are omitted?
> > Should SPECTRUM => 'one_sided' be included in the first version, since
> most
> > IoT sensor data is real-valued?
> > Is a separate DFT TVF necessary in the first version, or should we start
> > with FFT only and add DFT later if there is a clear use case?
> >
> > References I checked:
> >
> > [1] MATLAB fft:
> > https://www.mathworks.com/help/matlab/ref/fft.html
> >
> > [2] NumPy fft:
> > https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html
> >
> > [3] SciPy fft:
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.html
> >
> > [4] SciPy DFT matrix:
> >
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html
> >
> > [5] Kusto series_fft:
> > https://learn.microsoft.com/en-us/kusto/query/series-fft-function
> >
> > Best,
> > Bryan Yang(杨易达)
> >
> > Yuan Tian <[email protected]> 于2026年6月8日周一 18:10写道：
> >
> > > Hi Bryan,
> > >
> > > Thanks for bringing this up.
> > >
> > > For questions 1 and 2, I think FFT/DFT can be provided as built-in
> > > table-valued functions in the table model. Their semantics naturally
> fit
> > > the TVF abstraction, since a time-ordered sequence is transformed into
> > > multiple frequency-domain result rows.
> > >
> > > For question 3, I think it would be helpful to do some further research
> > > before finalizing the parameters and output schema. As far as I know,
> > > MATLAB and Python's SciPy both provide similar FFT/DFT functions, and
> > their
> > > APIs may be useful references. I have not looked into how other
> databases
> > > expose this kind of functionality yet. It may be worth checking both
> > these
> > > library functions and other databases to see what inputs they require
> and
> > > what outputs they return, then decide what design best fits IoTDB's
> table
> > > model.
> > >
> > > Best,
> > > Yuan
> > >
> > > On Mon, Jun 8, 2026 at 3:37 PM Bryan Yang <[email protected]> wrote:
> > >
> > > > Hi IoTDB community,
> > > >
> > > > I would like to discuss a possible feature for the IoTDB table model:
> > > > adding built-in FFT and DFT functions for time-series
> frequency-domain
> > > > analysis.
> > > >
> > > > FFT stands for Fast Fourier Transform, and DFT stands for Discrete
> > > Fourier
> > > > Transform. Both are used to transform time-domain data into
> > > > frequency-domain data. FFT is essentially a fast algorithm for
> > computing
> > > > DFT, so I think these two functions can be designed together, sharing
> > > > similar parameters, output schema, and test cases.
> > > >
> > > > For IoTDB, this could be useful for scenarios such as sensor
> vibration
> > > > analysis, dominant frequency detection, and periodic signal analysis.
> > > > Preliminary Analysis
> > > >
> > > > After some preliminary analysis, I think FFT/DFT are more suitable as
> > > > table-valued functions (TVFs), rather than scalar functions or window
> > > > functions.
> > > >
> > > > The reason is that FFT/DFT do not work as one-row-in, one-row-out
> > scalar
> > > > functions like abs(), sin(), or round(). They also do not aggregate
> > > > multiple rows into a single value like avg() or sum().
> > > >
> > > > Instead, their semantics are:
> > > >
> > > > a time-ordered sequence -> multiple frequency points
> > > >
> > > > Possible SQL Form
> > > >
> > > > SELECT *
> > > > FROM FFT(
> > > >   DATA => (
> > > >     SELECT time, device_id, value
> > > >     FROM sensor
> > > >   ) PARTITION BY device_id ORDER BY time,
> > > >   VALUE => 'value'
> > > > );
> > > >
> > > > This means that the input table is partitioned by device_id, each
> > > partition
> > > > is ordered by time, and the value column is transformed into
> > > > frequency-domain results.
> > > >
> > > > Similarly, DFT could use the same form:
> > > >
> > > > SELECT *
> > > > FROM DFT(
> > > >   DATA => (
> > > >     SELECT time, value
> > > >     FROM sensor
> > > >     WHERE device_id = 'd1'
> > > >   ) ORDER BY time,
> > > >   VALUE => 'value'
> > > > );
> > > >
> > > > Possible Output Schema
> > > >
> > > > A possible output schema could be:
> > > >
> > > > frequency_index, frequency(optional), real, imag, amplitude, phase
> > > >
> > > > Here, frequency_index, real, and imag are the core results of
> FFT/DFT.
> > > > amplitude and phase can be derived from real/imag and may be useful
> for
> > > > analysis.
> > > >
> > > > The frequency column would require the user to provide a sample rate
> or
> > > > sample interval; otherwise, only frequency_index can be returned.
> > > > Existing Related Work
> > > >
> > > > I also noticed that IoTDB already has FFT-related UDF support in the
> > > > library-udf module. This proposal focuses on whether FFT/DFT should
> be
> > > > provided as built-in functions in the table model, and whether TVF is
> > the
> > > > right abstraction.
> > > > Questions
> > > >
> > > > I would appreciate your feedback on this direction, especially:
> > > >
> > > >    1.
> > > >
> > > >    Whether FFT/DFT are suitable as built-in functions in the table
> > model.
> > > >    2.
> > > >
> > > >    Whether TVF is the right function type for them.
> > > >    3.
> > > >
> > > >    What the expected parameters and output schema should be.
> > > >
> > > > Best regards, Bryan Yang（杨易达）
> > > >
> > >
> >
>

Re: Built-in FFT/DFT Functions for IoTDB Table Model

Reply via email to