Xiangdong Huang created IOTDB-544:
-------------------------------------
Summary: Apache IoTDB integration with more powerful aggregation
index
Key: IOTDB-544
URL: https://issues.apache.org/jira/browse/IOTDB-544
Project: Apache IoTDB
Issue Type: Wish
Components: Core/Engine
Reporter: Xiangdong Huang
IoTDB is a highly efficient time series database, which supports high speed
query process, including aggregation query.
Currently, IoTDB pre-calculates the aggregation info, or called the summary
info, (sum, count, max_time, min_time, max_value, min_value) for each page and
each Chunk. The info is helpful for aggregation operations and some query
filters. For example, if the query filter is value >10 and the max value of a
page is 9, we can skip the page. For another example, if the query is select
max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) is
20.
However, there are two drawbacks:
1. The summary info actually reduces the data that needs to be scanned as 1/k
(suppose each page has k data points). However, the time complexity is still
O(N). If we store a long historical data, e.g., storing 2 years data with
500KHz, then the aggregation operation may be still time-consuming. So, a
tree-based index to reduce the time complexity from O(N) to O(logN) is a good
choice. Some basic ideas have been published in [1], while it can just handle
data with fix frequency. So, improving it and implementing it into IoTDB is a
good choice.
2. The summary info is helpless for evaluating the query like where value >8 if
the max value = 10. If we can enrich the summary info, e.g., storing the data
histogram, we can use the histogram to evaluate how many points we can return.
This proposal is mainly for adding an index for speeding up the aggregation
query. Besides, if we can let the summary info be more useful, it could be
better.
Notice that the premise is that the insertion speed should not be slow down too
much!
You should know:
• IoTDB query process
• TsFile structure and organization
• Basic index knowledge
• Java
difficulty: Major
mentors:
[email protected]
Reference:
[1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)