Re: Built-in FFT/DFT Functions for IoTDB Table Model

Bryan Yang Tue, 16 Jun 2026 00:47:26 -0700

*Hi Yuan Tian,*

Thanks for the clarification. I agree that keeping N and NORM in the v1
signature makes sense, since they are standard FFT parameters and their
semantics are clear.


I will update the v1 design accordingly.

N will be an optional integer transform length. If N is not provided, the
transform length defaults to the input length of each partition. If N is
smaller than the input length, the input sequence will be truncated. If N
is larger than the input length, the input sequence will be zero-padded.
The number of output frequency bins will be N.

*NORM* will be an optional *string* *parameter*. The supported values will
be backward, forward, and ortho. The default value will be backward,
aligned with numpy.fft.fft.

The frequency axis will be calculated based on the transform length N and
SAMPLE_INTERVAL. If SAMPLE_INTERVAL is provided, it is used directly.
Otherwise, it is inferred from the time column within each partition as:

*(last_time - first_time) / (row_count - 1)*

The frequency unit will be Hz, i.e., cycles per second.

The rest of the v1 design remains unchanged:

1. expose FFT only, not DFT;
2. no VALUE parameter;
3. transform all numeric columns except the time column and PARTITION BY
columns;
4. require time to be strictly ascending within each partition;
5. assume uniformly sampled input in v1 without checking every adjacent
interval;
6. output full spectrum only;
7. keep the minimal output schema with frequency_index, frequency, and
real/imag columns;
8. leave amplitude, phase, and one-sided spectrum for possible future
extensions.

*I will also make one detail explicit in the design doc: when N is
provided, the number of output rows per partition is N, and frequency_index
ranges from 0 to N - 1. When N is not provided, N defaults to the input
length of that partition.*


*Best,Bryan Yang（杨易达）*

Yuan Tian <[email protected]> 于2026年6月16日周二 15:39写道：

> Hi Bryan,
>
> Thanks for summarizing the revised v1 design. I agree with most of the
> points. The v1 scope is much clearer now.
>
> One adjustment I would suggest is to keep N and NORM in the v1 signature.
> They are standard parameters in NumPy/SciPy FFT APIs, and their semantics
> are relatively straightforward.
>
> For N:
>
> N can be an optional integer parameter. If it is not provided, the
> transform length defaults to the input length of each partition. If N is
> smaller than the input length, the input is truncated. If N is larger than
> the input length, the input is zero-padded. The number of output frequency
> bins should be N.
>
> For NORM:
>
> NORM can also be optional, with the same supported values as NumPy:
>
> backward
> forward
> ortho
>
> The default can be backward, which is also the default behavior of
> numpy.fft.fft.
>
> So the v1 parameters could be:
>
> DATA: required table argument.
> SAMPLE_INTERVAL: optional duration literal, such as 1ms or 1s.
> N: optional integer transform length.
> NORM: optional string, one of backward, forward, or ortho.
>
> Other parts of the design still look good to me:
>
> 1. Start with FFT only. DFT can be added later only if there is a concrete
> use case.
>
> 2. Do not require a VALUE parameter. All numeric columns except the time
> column and PARTITION BY columns can be transformed. Users can control the
> selected value columns through the DATA subquery.
>
> 3. SAMPLE_INTERVAL should be optional. If provided, it overrides the
> interval inferred from the time column. If it is not provided, the interval
> can be inferred within each partition as:
>
> (last_time - first_time) / (row_count - 1)
>
> For v1, I think it is acceptable to assume uniformly sampled input and not
> validate every adjacent timestamp interval.
>
> 4. We should still require the time column to be strictly ascending within
> each partition. If timestamps are not ascending, or if there are duplicated
> timestamps, the function should throw an exception.
>
> 5. If a partition has fewer than two rows and SAMPLE_INTERVAL is not
> provided, the function should reject the query because it cannot infer the
> interval. If SAMPLE_INTERVAL is provided, this case can still be handled.
>
> 6. For v1, outputting the full spectrum only is fine. One-sided output can
> be considered later.
>
> 7. The minimal output schema can remain:
>
> partition columns...,
> frequency_index,
> frequency,
> temperature_real,
> temperature_imag,
> speed_real,
> speed_imag
>
> The frequency column should be calculated from SAMPLE_INTERVAL and N, and I
> think its unit should be Hz, i.e., cycles per second.
>
> Amplitude and phase can be added later as convenience columns if users need
> them, but I agree they do not need to be part of the minimal v1 schema.
>
> With N and NORM included, a possible v1 syntax could be:
>
> SELECT *
> FROM FFT(
>   DATA => (
>     SELECT time, device_id, temperature, speed
>     FROM sensor
>   ) PARTITION BY device_id ORDER BY time,
>   SAMPLE_INTERVAL => 1ms,
>   N => 1024,
>   NORM => 'backward'
> );
>
> Best regards,
> --------------------
> Yuan Tian
>
> On Tue, Jun 16, 2026 at 2:09 PM Bryan Yang <[email protected]> wrote:
>
> > *Hi Yuan Tian,*
> >
> > Thanks for the detailed feedback. I agree with your suggestions, and I
> > think they make the v1 scope cleaner.
> >
> > Based on your comments, I would revise the v1 FFT design as follows:
> >
> > 1. Start with FFT only.
> >
> > We do not need to expose DFT as a separate TVF in the first version. FFT
> > can be treated as the practical implementation for frequency-domain
> > analysis, and DFT can be discussed later if there is a concrete use case.
> >
> > 2. Remove the VALUE parameter.
> >
> > The FFT TVF can transform all numeric columns from the input table,
> > excluding the time column and PARTITION BY columns. If users only want to
> > transform a subset of columns, they can project only those columns in the
> > DATA subquery.
> >
> > For example:
> >
> > SELECT *
> > FROM FFT(
> >   DATA => (
> >     SELECT time, device_id, temperature, speed
> >     FROM sensor
> >   ) PARTITION BY device_id ORDER BY time
> > );
> >
> > This would produce FFT results for both temperature and speed.
> >
> > 3. Use the time column for ordering and sample interval inference.
> >
> > SAMPLE_INTERVAL can be optional and represented as a duration literal,
> for
> > example:
> >
> > SAMPLE_INTERVAL => 1ms
> > SAMPLE_INTERVAL => 1s
> >
> > If SAMPLE_INTERVAL is provided, it overrides the inferred interval.
> > Otherwise, the interval can be inferred within each partition as:
> >
> > (last_time - first_time) / (row_count - 1)
> >
> > For v1, we can assume the input is uniformly sampled and only validate
> that
> > timestamps are ascending within each partition.
> >
> > One small edge case is when a partition has fewer than two rows. In that
> > case, if SAMPLE_INTERVAL is not provided, the function cannot infer the
> > interval. I think we should either reject that partition/query or require
> > SAMPLE_INTERVAL for such cases.
> >
> > 4. Output the full spectrum only in v1.
> >
> > This keeps the behavior close to numpy.fft.fft. One-sided output can be
> > considered later, possibly as a separate option or function.
> >
> > 5. Keep the output schema minimal.
> >
> > For v1, the output schema can include partition columns, frequency_index,
> > frequency, and real/imaginary output columns for each transformed value
> > column:
> >
> > partition columns...,
> > frequency_index,
> > frequency,
> > temperature_real,
> > temperature_imag,
> > speed_real,
> > speed_imag
> >
> > Amplitude and phase can be added later as convenience columns if users
> need
> > them.
> >
> > 6. Keep N and NORM out of the v1 signature.
> >
> > For the first version, I also think we can avoid exposing N and NORM.
> >
> > The transform length can default to the input length of each partition.
> The
> > normalization behavior can follow the default behavior of numpy.fft.fft.
> > This keeps the first version small and avoids adding options before we
> have
> > concrete user requirements.
> >
> > So a possible minimal v1 syntax would be:
> >
> > SELECT *
> > FROM FFT(
> >   DATA => (
> >     SELECT time, device_id, temperature, speed
> >     FROM sensor
> >   ) PARTITION BY device_id ORDER BY time,
> >   SAMPLE_INTERVAL => 1ms
> > );
> >
> > or, with inferred sample interval:
> >
> > SELECT *
> > FROM FFT(
> >   DATA => (
> >     SELECT time, device_id, temperature, speed
> >     FROM sensor
> >   ) PARTITION BY device_id ORDER BY time
> > );
> >
> >
> > *Best,Bryan Yang*
> >
> >
> > Yuan Tian <[email protected]> 于2026年6月16日周二 10:19写道：
> >
> > > Hi Bryan,
> > >
> > > Sorry for the late reply.
> > >
> > >
> > > Thanks for the further research. I agree with the general direction
> > > that FFT fits better as a built-in TVF in the table model.
> > >
> > > I have a few additional thoughts on the v1 design:
> > >
> > > 1. I think we can start with FFT only in the first version.
> > >
> > > DFT does not need to be exposed as a separate TVF initially. We can
> > > treat FFT as the practical implementation for frequency-domain
> > > analysis, and add a separate DFT function later only if there is a
> > > clear use case.
> > >
> > > 2. I think we may not need a VALUE parameter.
> > >
> > > For the input table, all numeric columns except the time column and
> > > the PARTITION BY columns can be treated as value columns to transform.
> > > If users only want to transform a subset of columns, they can select
> > > only those columns in the DATA subquery.
> > >
> > > For example:
> > >
> > > SELECT *
> > > FROM FFT(
> > >   DATA => (
> > >     SELECT time, device_id, temperature, speed
> > >     FROM sensor
> > >   ) PARTITION BY device_id ORDER BY time
> > > );
> > >
> > > Here, temperature and speed would both be transformed.
> > >
> > > 3. About time and sample interval.
> > >
> > > NumPy's fft itself does not take a time column or timestamps as input.
> > > It only takes the value array. The sample interval is only needed when
> > > computing the physical frequency axis, for example through
> > > numpy.fft.fftfreq(n, d=sample_interval).
> > >
> > > For IoTDB, since the table model has a time column, we can use the
> > > time column to define the input order and infer the sample interval
> > > when the user does not provide one.
> > >
> > > I suggest making SAMPLE_INTERVAL an optional parameter, represented as
> > > a duration literal, such as:
> > >
> > > SAMPLE_INTERVAL => 1ms
> > > SAMPLE_INTERVAL => 1s
> > >
> > > If SAMPLE_INTERVAL is provided, it should override the interval
> > > inferred from the time column. If it is not provided, the function can
> > > infer it as:
> > >
> > > (last_time - first_time) / (row_count - 1)
> > >
> > > I do not think we need to validate whether every adjacent timestamp
> > > interval is exactly the same in v1. We can assume the input represents
> > > a uniformly sampled sequence.
> > >
> > > However, we should still validate that the time column is ascending
> > > within each partition. If the timestamps are not ascending in a
> > > partition, the function should throw an exception.
> > >
> > > 4. About SPECTRUM.
> > >
> > > SPECTRUM mainly controls whether the function outputs the full FFT
> > > spectrum or only the one-sided spectrum.
> > >
> > > For v1, I think we can keep this simple and output the full spectrum,
> > > which is closer to numpy.fft.fft. One-sided output, similar to
> > > numpy.fft.rfft, can be discussed later.
> > >
> > > 5. About the output schema.
> > >
> > > NumPy's fft returns complex values. It does not directly output
> > > amplitude or phase. Amplitude and phase are derived values, for
> > > example abs(result) and angle(result).
> > >
> > > So for v1, I suggest the core output schema should include real and
> > > imaginary parts only. For multiple value columns, the output columns
> > > should be prefixed with the original column names, for example:
> > >
> > > partition columns...,
> > > frequency_index,
> > > frequency,
> > > temperature_real,
> > > temperature_imag,
> > > speed_real,
> > > speed_imag
> > >
> > > Here, frequency_index should mean the FFT output index / frequency bin
> > > index, not just a generated row number. It is useful for preserving
> > > the original FFT output order and aligning with the position in the
> > > FFT result array.
> > >
> > > The frequency column is not the same for every row in a partition.
> > > Each output row corresponds to one frequency bin. For the same
> > > partition, multiple transformed value columns share the same
> > > frequency_index and frequency axis.
> > >
> > > Amplitude and phase can be added later if we think they are useful
> > > convenience columns, but I would prefer not to include them in the
> > > minimal v1 schema.
> > >
> > > Best regards,
> > > -----------------
> > > Yuan Tian
> > >
> > >
> > > On Wed, Jun 10, 2026 at 2:14 PM Bryan Yang <[email protected]>
> wrote:
> > >
> > > > Hi Yuan Tian,
> > > >
> > > > Thanks for the suggestion.
> > > >
> > > > I did some preliminary research on MATLAB, NumPy/SciPy, Azure Data
> > > Explorer
> > > > (Kusto), and the existing IoTDB FFT UDF implementation. My current
> > > > understanding is aligned with your suggestion: FFT/DFT fit IoTDB’s
> > table
> > > > model best as built-in table-valued functions.
> > > >
> > > > They consume a partitioned and time-ordered numeric sequence, and
> > produce
> > > > multiple frequency-domain result rows. Therefore, they are not scalar
> > > > functions, because they do not operate on a single row. They are also
> > > > different from ordinary window functions, because the output rows
> > > represent
> > > > frequency bins rather than the original input rows.
> > > >
> > > > A possible first version of FFT could look like this:
> > > >
> > > > SELECT *
> > > > FROM FFT(
> > > >   DATA => (
> > > >     SELECT time, device_id, value
> > > >     FROM sensor
> > > >   ) PARTITION BY device_id ORDER BY time,
> > > >   VALUE => 'value',
> > > >   TIMECOL => 'time',
> > > >   SAMPLE_RATE => 1000,
> > > >   N => 1024,
> > > >   NORM => 'backward',
> > > >   SPECTRUM => 'full'
> > > > );
> > > >
> > > >
> > > > If we decide to expose DFT as a separate TVF, it could share the same
> > > > signature and output schema:
> > > >
> > > > SELECT *
> > > > FROM DFT(
> > > >   DATA => (...) PARTITION BY device_id ORDER BY time,
> > > >   VALUE => 'value',
> > > >   TIMECOL => 'time',
> > > >   SAMPLE_RATE => 1000,
> > > >   N => 1024
> > > > );
> > > >
> > > >
> > > > Suggested input parameters:
> > > >
> > > > DATA: required table argument. It provides the input sequence.
> > PARTITION
> > > BY
> > > > can be used to compute one transform per device or tag group, and
> ORDER
> > > BY
> > > > defines the time order.
> > > >
> > > > VALUE: required string. The numeric column to transform. Supported
> > input
> > > > types can be INT32, INT64, FLOAT, and DOUBLE.
> > > >
> > > > TIMECOL: optional string. The timestamp column name, defaulting to
> > time.
> > > >
> > > > SAMPLE_RATE / SAMPLE_INTERVAL: sampling frequency or sampling
> interval,
> > > > used to compute the physical frequency. I think exactly one of them
> > > should
> > > > be provided if we want to output physical frequency. If neither is
> > > > provided, we may need to define whether the function should reject
> the
> > > > query or output only normalized frequency.
> > > >
> > > > N: optional integer. Transform length. If N is smaller than the input
> > > > length, truncate the input; if larger, zero-pad it. This follows
> > > > MATLAB/SciPy semantics.
> > > >
> > > > NORM: optional string. Normalization mode, such as backward, forward,
> > or
> > > > ortho.
> > > >
> > > > SPECTRUM: optional string. Output frequency range, such as full or
> > > > one_sided.
> > > >
> > > > Suggested output schema:
> > > >
> > > > frequency_index (INT64): frequency bin index.
> > > > frequency (DOUBLE): physical frequency derived from SAMPLE_RATE or
> > > > SAMPLE_INTERVAL.
> > > > real (DOUBLE): real part of the transform result.
> > > > imag (DOUBLE): imaginary part of the transform result.
> > > > amplitude (DOUBLE): magnitude, computed as sqrt(real^2 + imag^2).
> > > > phase (DOUBLE): phase angle in radians, computed as atan2(imag,
> real).
> > > >
> > > > If the input table is partitioned by device or tags, the partition
> > > columns
> > > > should be preserved in the output, so users can identify which
> > > > frequency-domain rows belong to each input series.
> > > >
> > > > For the first version, I suggest keeping the scope simple:
> > > >
> > > > support one real-valued numeric input column;
> > > > require or assume uniformly sampled data;
> > > > support N with truncation and zero-padding semantics;
> > > > use the same output schema for FFT and DFT if both are exposed;
> > > > treat FFT as the practical high-performance implementation;
> > > > discuss whether a separate DFT TVF is necessary in v1, since most
> > > libraries
> > > > expose FFT as the efficient implementation of DFT.
> > > >
> > > > There are still a few points worth discussing:
> > > >
> > > > Should SAMPLE_RATE or SAMPLE_INTERVAL be required, or should we allow
> > > > normalized frequency output when they are omitted?
> > > > Should SPECTRUM => 'one_sided' be included in the first version,
> since
> > > most
> > > > IoT sensor data is real-valued?
> > > > Is a separate DFT TVF necessary in the first version, or should we
> > start
> > > > with FFT only and add DFT later if there is a clear use case?
> > > >
> > > > References I checked:
> > > >
> > > > [1] MATLAB fft:
> > > > https://www.mathworks.com/help/matlab/ref/fft.html
> > > >
> > > > [2] NumPy fft:
> > > > https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html
> > > >
> > > > [3] SciPy fft:
> > > >
> > https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fft.html
> > > >
> > > > [4] SciPy DFT matrix:
> > > >
> > >
> >
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html
> > > >
> > > > [5] Kusto series_fft:
> > > > https://learn.microsoft.com/en-us/kusto/query/series-fft-function
> > > >
> > > > Best,
> > > > Bryan Yang(杨易达)
> > > >
> > > > Yuan Tian <[email protected]> 于2026年6月8日周一 18:10写道：
> > > >
> > > > > Hi Bryan,
> > > > >
> > > > > Thanks for bringing this up.
> > > > >
> > > > > For questions 1 and 2, I think FFT/DFT can be provided as built-in
> > > > > table-valued functions in the table model. Their semantics
> naturally
> > > fit
> > > > > the TVF abstraction, since a time-ordered sequence is transformed
> > into
> > > > > multiple frequency-domain result rows.
> > > > >
> > > > > For question 3, I think it would be helpful to do some further
> > research
> > > > > before finalizing the parameters and output schema. As far as I
> know,
> > > > > MATLAB and Python's SciPy both provide similar FFT/DFT functions,
> and
> > > > their
> > > > > APIs may be useful references. I have not looked into how other
> > > databases
> > > > > expose this kind of functionality yet. It may be worth checking
> both
> > > > these
> > > > > library functions and other databases to see what inputs they
> require
> > > and
> > > > > what outputs they return, then decide what design best fits IoTDB's
> > > table
> > > > > model.
> > > > >
> > > > > Best,
> > > > > Yuan
> > > > >
> > > > > On Mon, Jun 8, 2026 at 3:37 PM Bryan Yang <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi IoTDB community,
> > > > > >
> > > > > > I would like to discuss a possible feature for the IoTDB table
> > model:
> > > > > > adding built-in FFT and DFT functions for time-series
> > > frequency-domain
> > > > > > analysis.
> > > > > >
> > > > > > FFT stands for Fast Fourier Transform, and DFT stands for
> Discrete
> > > > > Fourier
> > > > > > Transform. Both are used to transform time-domain data into
> > > > > > frequency-domain data. FFT is essentially a fast algorithm for
> > > > computing
> > > > > > DFT, so I think these two functions can be designed together,
> > sharing
> > > > > > similar parameters, output schema, and test cases.
> > > > > >
> > > > > > For IoTDB, this could be useful for scenarios such as sensor
> > > vibration
> > > > > > analysis, dominant frequency detection, and periodic signal
> > analysis.
> > > > > > Preliminary Analysis
> > > > > >
> > > > > > After some preliminary analysis, I think FFT/DFT are more
> suitable
> > as
> > > > > > table-valued functions (TVFs), rather than scalar functions or
> > window
> > > > > > functions.
> > > > > >
> > > > > > The reason is that FFT/DFT do not work as one-row-in, one-row-out
> > > > scalar
> > > > > > functions like abs(), sin(), or round(). They also do not
> aggregate
> > > > > > multiple rows into a single value like avg() or sum().
> > > > > >
> > > > > > Instead, their semantics are:
> > > > > >
> > > > > > a time-ordered sequence -> multiple frequency points
> > > > > >
> > > > > > Possible SQL Form
> > > > > >
> > > > > > SELECT *
> > > > > > FROM FFT(
> > > > > >   DATA => (
> > > > > >     SELECT time, device_id, value
> > > > > >     FROM sensor
> > > > > >   ) PARTITION BY device_id ORDER BY time,
> > > > > >   VALUE => 'value'
> > > > > > );
> > > > > >
> > > > > > This means that the input table is partitioned by device_id, each
> > > > > partition
> > > > > > is ordered by time, and the value column is transformed into
> > > > > > frequency-domain results.
> > > > > >
> > > > > > Similarly, DFT could use the same form:
> > > > > >
> > > > > > SELECT *
> > > > > > FROM DFT(
> > > > > >   DATA => (
> > > > > >     SELECT time, value
> > > > > >     FROM sensor
> > > > > >     WHERE device_id = 'd1'
> > > > > >   ) ORDER BY time,
> > > > > >   VALUE => 'value'
> > > > > > );
> > > > > >
> > > > > > Possible Output Schema
> > > > > >
> > > > > > A possible output schema could be:
> > > > > >
> > > > > > frequency_index, frequency(optional), real, imag, amplitude,
> phase
> > > > > >
> > > > > > Here, frequency_index, real, and imag are the core results of
> > > FFT/DFT.
> > > > > > amplitude and phase can be derived from real/imag and may be
> useful
> > > for
> > > > > > analysis.
> > > > > >
> > > > > > The frequency column would require the user to provide a sample
> > rate
> > > or
> > > > > > sample interval; otherwise, only frequency_index can be returned.
> > > > > > Existing Related Work
> > > > > >
> > > > > > I also noticed that IoTDB already has FFT-related UDF support in
> > the
> > > > > > library-udf module. This proposal focuses on whether FFT/DFT
> should
> > > be
> > > > > > provided as built-in functions in the table model, and whether
> TVF
> > is
> > > > the
> > > > > > right abstraction.
> > > > > > Questions
> > > > > >
> > > > > > I would appreciate your feedback on this direction, especially:
> > > > > >
> > > > > >    1.
> > > > > >
> > > > > >    Whether FFT/DFT are suitable as built-in functions in the
> table
> > > > model.
> > > > > >    2.
> > > > > >
> > > > > >    Whether TVF is the right function type for them.
> > > > > >    3.
> > > > > >
> > > > > >    What the expected parameters and output schema should be.
> > > > > >
> > > > > > Best regards, Bryan Yang（杨易达）
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Built-in FFT/DFT Functions for IoTDB Table Model

Reply via email to