[
https://issues.apache.org/jira/browse/TAJO-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796731#comment-13796731
]
Jihoon Son commented on TAJO-256:
---------------------------------
Group by extension queries require significantly high overhead.
Thus, the query optimization, especially the distributed plan is very important.
Statistics such as histogram are very useful for the query optimization.
Unfortunately, the current Tajo doesn't store any statistics for raw tables.
In this case, the sample-based cost estimation is a good solution.
In the sample-base cost estimation, the aggregation query is executed for the
sampled table before executing the query for the original table.
Here, statistics of the sampled data are collected during the query execution.
After that, more optimized query planning for the original table is possible
using the collected statistics.
So, I added the sample-based cost estimation to this issue.
> Support data cube (Umbrella)
> ----------------------------
>
> Key: TAJO-256
> URL: https://issues.apache.org/jira/browse/TAJO-256
> Project: Tajo
> Issue Type: New Feature
> Components: catalog, distributed query plan, parser
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.3-incubating
>
>
> This issue includes follows sub issues
> * SQL support of group by extensions (GROUPING SETS, CUBE, ROLLUP)
> * Query execution of group by extensions
> * GROUPING() function
> * Data cube materialization process
> * Cube schema maintenance
> * Sample-based cost estimation
--
This message was sent by Atlassian JIRA
(v6.1#6144)