Hi all,

Let's start the thread to discuss about the integration. As the first step,
we need to design the SQL interface.

Time series similarity search is a fundamental problem in data mining. It
is quite useful for analyzing sensor data stored in IoTDB.
KV-match is one of approaches to solve this problem, which supports both
raw subsequence matching (RSM) and constrained normalized subsequence
matching (cNSM) under either Euclidean Distance (ED) or Dynamic Time
Warping (DTW).

As KV-match is just one implementation, in SQL layer we should add a new
query type as follows.


> *SELECT similar(<pattern_path>, <parameter_1[,parameter_2,...]>)
> from <query_path> where <single_range_time_criterion>;*


IoTDB should can be configured to use KV-match or other algorithms, or even
provide a naive brute force option for user to choose.
Add a configuration e.g.

similarity-search-implementation = KV-match | KV-match_simple | Naive | ...


The parameter list is highly related to the implementation.
For KV-match, we have following parameters (Please refer to the paper for
their meanings):

query_type           : enum   [RSM|cNSM] (required)
> distance_measure     : enum   [ED|DTW]   (required)
> distance_threshold   : double            (required)
> amplitude_scaling    : double            (optional, for cNSM only, default
> 1.1)
> offset_shifting      : double            (optional, for cNSM only, default
> 5.0)

warping_path         : double            (optional, for  DTW only, default
> 5.0)

max_answers          : int               (optional, default 100)

remove_overlapped    : bool              (optional, default TRUE)

overlapped_threshold : double            (optional, default 0.5)


It is ideal that IoTDB can support temporary memory tables (time series)
which is associated with session life-cycle, likes MySQL as follows:

*CREATE TEMPORARY TIMESERIES <pattern_path> SELECT *
> FROM <pattern_original_from_path> WHERE <criterion>;*


User can use the above SQL to select a pattern from other series and modify
it if they want. Then pass the <pattern_path> to the first SQL for
similarity search.
If this can not be implemented in the near future, we can just let user
store the pattern to a normal path and use the full series as the pattern
for query.

The result of similarity search will like follows:

+--------------------------+

| Rank |   Time | Distance |

+--------------------------+

|    1 |    124 |      0.3 |
> |    2 |    544 |      1.4 |
> |    3 |    346 |      3.5 |

+--------------------------+


KV-match or other approaches may need to build index on timeseries to
facilitate queries. IoTDB should support index creation SQL.

*CREATE INDEX
> ON <query_path> USING <index_type> <parameter_1[,parameter_2,...]>*



For KV-match, we use KV-index as our index structure, which has following
parameters (Please refer to the paper for their meanings):

window_lengths       : [int, int, ...]   (optional, default
> [25,50,100,200,400])
> merge_enabled        : bool              (optional, default TRUE)
> initial_row_range    : double            (optional, default 0.5)


Any suggestion on this? Appreciate your idea and help!

Thanks,
Jiaye Wu

Reply via email to