[
https://issues.apache.org/jira/browse/IOTDB-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173878#comment-17173878
]
Rong Kang commented on IOTDB-804:
---------------------------------
One of the targets of IoTDB Index Framework is to provide a list of common and
classical operators (or called "tools") in the time series domain. These
operators can be roughly put into three categories. Some of them has been
supported, and others will be added in the future:
- Time-Series Representations, like PAA(Piecewise Aggregate Approximation,
supported), DFT(Discrete Fourier Transform, not supported yet), etc..
- Measurements, like ED(Euclidean Distance, supported), DTW(Dynamic Time
Warping, supported).
- Index Techniques, like RTree(supported), NoIndex(sequential-scan-speedup),
iSAX, and Hash lookup Table.
We will give a brief introduction to iSAX and Hash lookup Table, and try to
implement them in the next version:
- ISAXIndex [1] first splits the input data into segments and extracts their
discrete SAX features, and then inserts the data features into a root leaf
node. When the number of series of a leaf node reaches the given threshold,
ISAXIndex will upscale the resolution of SAX features, thus dividing them into
two leaf nodes. The SAX lower bound constraint is used to guarantee
no-false-dismissals pruning.
In some indexing techniques, ISAXIndex [2] is composed of a list of subtrees.
Suppose the input series is divided into b segments, all data will be first
divided into 2^b sub-index tree according to the first-bits of SAX features.
- The Hash lookup table is a data structure a structure that can map keys to
values. A hash table uses a hash function to compute an index, also called hash
code, into an array of buckets, from which the desired value can be found.
Generally, the cost of retrieving a bucket can be regarded as O(1) Hamming
space retrieval [3] returns data points within a Hamming ball of radius H for
each query. Therefore, it enables the constant-time Approximate Nearest
Neighbor (ANN) search through hash lookups. Based on the mature hash lookup
algorithm and the Hamming space retrieval, a rich line of hash methods focuses
on better feature representations for preserving the similarity relationship in
the Hamming space.
[1] Shieh, Jin, and Eamonn Keogh, "ISAX: Disk-Aware Mining and Indexing of
Massive Time Series Datasets". Data Mining and Knowledge Discovery 19(1):
24–57. 2009
[2] Camerra, Alessandro, Themis Palpanas, Jin Shieh, and Eamonn Keogh. "ISAX
2.0: Indexing and Mining One Billion Time Series." In 2010 IEEE International
Conference on Data Mining Pp. 58–67. 2010
[3] Cao, Yue, Mingsheng Long, Bin Liu, and Jianmin Wang. "Deep Cauchy Hashing
for Hamming Space Retrieval." In Pp. 1229–1237. 2018.
> Index Framework
> ---------------
>
> Key: IOTDB-804
> URL: https://issues.apache.org/jira/browse/IOTDB-804
> Project: Apache IoTDB
> Issue Type: New Feature
> Components: Core/Engine
> Reporter: Rong Kang
> Priority: Major
>
> Implement the index framework in IoTDB, supporting the time-series index:
> creation, building and query.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)