Hi all, Let's start the thread to discuss about the integration. As the first step, we need to design the SQL interface.
Time series similarity search is a fundamental problem in data mining. It is quite useful for analyzing sensor data stored in IoTDB. KV-match is one of approaches to solve this problem, which supports both raw subsequence matching (RSM) and constrained normalized subsequence matching (cNSM) under either Euclidean Distance (ED) or Dynamic Time Warping (DTW). As KV-match is just one implementation, in SQL layer we should add a new query type as follows. > *SELECT similar(<pattern_path>, <parameter_1[,parameter_2,...]>) > from <query_path> where <single_range_time_criterion>;* IoTDB should can be configured to use KV-match or other algorithms, or even provide a naive brute force option for user to choose. Add a configuration e.g. similarity-search-implementation = KV-match | KV-match_simple | Naive | ... The parameter list is highly related to the implementation. For KV-match, we have following parameters (Please refer to the paper for their meanings): query_type : enum [RSM|cNSM] (required) > distance_measure : enum [ED|DTW] (required) > distance_threshold : double (required) > amplitude_scaling : double (optional, for cNSM only, default > 1.1) > offset_shifting : double (optional, for cNSM only, default > 5.0) warping_path : double (optional, for DTW only, default > 5.0) max_answers : int (optional, default 100) remove_overlapped : bool (optional, default TRUE) overlapped_threshold : double (optional, default 0.5) It is ideal that IoTDB can support temporary memory tables (time series) which is associated with session life-cycle, likes MySQL as follows: *CREATE TEMPORARY TIMESERIES <pattern_path> SELECT * > FROM <pattern_original_from_path> WHERE <criterion>;* User can use the above SQL to select a pattern from other series and modify it if they want. Then pass the <pattern_path> to the first SQL for similarity search. If this can not be implemented in the near future, we can just let user store the pattern to a normal path and use the full series as the pattern for query. The result of similarity search will like follows: +--------------------------+ | Rank | Time | Distance | +--------------------------+ | 1 | 124 | 0.3 | > | 2 | 544 | 1.4 | > | 3 | 346 | 3.5 | +--------------------------+ KV-match or other approaches may need to build index on timeseries to facilitate queries. IoTDB should support index creation SQL. *CREATE INDEX > ON <query_path> USING <index_type> <parameter_1[,parameter_2,...]>* For KV-match, we use KV-index as our index structure, which has following parameters (Please refer to the paper for their meanings): window_lengths : [int, int, ...] (optional, default > [25,50,100,200,400]) > merge_enabled : bool (optional, default TRUE) > initial_row_range : double (optional, default 0.5) Any suggestion on this? Appreciate your idea and help! Thanks, Jiaye Wu
