[ 
https://issues.apache.org/jira/browse/IOTDB-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173878#comment-17173878
 ] 

Rong Kang commented on IOTDB-804:
---------------------------------

One of the targets of IoTDB Index Framework is to provide a list of common and 
classical operators (or called "tools") in the time series domain. These 
operators can be roughly put into three categories. Some of them has been 
supported, and others will be added in the future:

- Time-Series Representations, like PAA(Piecewise Aggregate Approximation, 
supported), DFT(Discrete Fourier Transform, not supported yet), etc..

- Measurements, like ED(Euclidean Distance, supported), DTW(Dynamic Time 
Warping, supported).

- Index Techniques, like RTree(supported), NoIndex(sequential-scan-speedup), 
iSAX, and Hash lookup Table.

We will give a brief introduction to iSAX and Hash lookup Table, and try to 
implement them in the next version:

- ISAXIndex [1] first splits the input data into segments and extracts their 
discrete SAX features, and then inserts the data features into a root leaf 
node.  When the number of series of a leaf node reaches the given threshold, 
ISAXIndex will upscale the resolution of SAX features, thus dividing them into 
two leaf nodes. The SAX lower bound constraint is used to guarantee 
no-false-dismissals pruning.
In some indexing techniques, ISAXIndex [2] is composed of a list of subtrees. 
Suppose the input series is divided into b segments, all data will be first 
divided into 2^b sub-index tree according to the first-bits of SAX features.

- The Hash lookup table is a data structure a structure that can map keys to 
values. A hash table uses a hash function to compute an index, also called hash 
code, into an array of buckets, from which the desired value can be found. 
Generally, the cost of retrieving a bucket can be regarded as O(1) Hamming 
space retrieval [3] returns data points within a Hamming ball of radius H for 
each query. Therefore, it enables the constant-time Approximate Nearest 
Neighbor (ANN) search through hash lookups. Based on the mature hash lookup 
algorithm and the Hamming space retrieval, a rich line of hash methods focuses 
on better feature representations for preserving the similarity relationship in 
the Hamming space.

[1] Shieh, Jin, and Eamonn Keogh, "ISAX: Disk-Aware Mining and Indexing of 
Massive Time Series Datasets". Data Mining and Knowledge Discovery 19(1): 
24–57. 2009
[2] Camerra, Alessandro, Themis Palpanas, Jin Shieh, and Eamonn Keogh. "ISAX 
2.0: Indexing and Mining One Billion Time Series." In 2010 IEEE International 
Conference on Data Mining Pp. 58–67. 2010
[3] Cao, Yue, Mingsheng Long, Bin Liu, and Jianmin Wang. "Deep Cauchy Hashing 
for Hamming Space Retrieval." In Pp. 1229–1237.  2018.



> Index Framework
> ---------------
>
>                 Key: IOTDB-804
>                 URL: https://issues.apache.org/jira/browse/IOTDB-804
>             Project: Apache IoTDB
>          Issue Type: New Feature
>          Components: Core/Engine
>            Reporter: Rong Kang
>            Priority: Major
>
> Implement the index framework in IoTDB, supporting the time-series index: 
> creation, building and query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to