[
https://issues.apache.org/jira/browse/IOTDB-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051779#comment-17051779
]
Zesong Sun commented on IOTDB-544:
----------------------------------
Hi,
I'm quite interested in index and summary information optimization. I'd like to
try my best in contributing to it.
Thanks,
Sun Zesong
> Apache IoTDB integration with more powerful aggregation index
> -------------------------------------------------------------
>
> Key: IOTDB-544
> URL: https://issues.apache.org/jira/browse/IOTDB-544
> Project: Apache IoTDB
> Issue Type: Wish
> Components: Core/Engine
> Reporter: Xiangdong Huang
> Priority: Major
> Labels: IoTDB, gsoc2020, mentor
>
> IoTDB is a highly efficient time series database, which supports high speed
> query process, including aggregation query.
> Currently, IoTDB pre-calculates the aggregation info, or called the summary
> info, (sum, count, max_time, min_time, max_value, min_value) for each page
> and each Chunk. The info is helpful for aggregation operations and some query
> filters. For example, if the query filter is value >10 and the max value of a
> page is 9, we can skip the page. For another example, if the query is select
> max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value)
> is 20.
> However, there are two drawbacks:
> 1. The summary info actually reduces the data that needs to be scanned as 1/k
> (suppose each page has k data points). However, the time complexity is still
> O(N). If we store a long historical data, e.g., storing 2 years data with
> 500KHz, then the aggregation operation may be still time-consuming. So, a
> tree-based index to reduce the time complexity from O(N) to O(logN) is a good
> choice. Some basic ideas have been published in [1], while it can just handle
> data with fix frequency. So, improving it and implementing it into IoTDB is a
> good choice.
> 2. The summary info is helpless for evaluating the query like where value >8
> if the max value = 10. If we can enrich the summary info, e.g., storing the
> data histogram, we can use the histogram to evaluate how many points we can
> return.
> This proposal is mainly for adding an index for speeding up the aggregation
> query. Besides, if we can let the summary info be more useful, it could be
> better.
> Notice that the premise is that the insertion speed should not be slow down
> too much!
> You should know:
> • IoTDB query process
> • TsFile structure and organization
> • Basic index knowledge
> • Java
> difficulty: Major
> mentors:
> [email protected]
> Reference:
> [1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)