[ 
https://issues.apache.org/jira/browse/TAJO-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796836#comment-13796836
 ] 

Hyunsik Choi commented on TAJO-256:
-----------------------------------

+1
Sounds interesting. Sample-based query planning will be widely used in various 
query types.

Sampling an input data should be executed as a distributed query. They should 
work transparently to users. However, the current Tajo does not support a 
internal mechanism to submit a distributed query for sampling without 
disclosing query executions. It would be also a nice idea to design and 
implement a sampling and analysis system to collect statistics information with 
various aspects. In addition, the sampling results will be available until an 
input table is changed. As a result, it will be very useful if the sampling 
results are kept in catalog system. For this, we may need more elaborate 
catalog system that maintains and keeps statistics information.

For them, it would be a good starting point to refer other database systems.

> Support data cube (Umbrella)
> ----------------------------
>
>                 Key: TAJO-256
>                 URL: https://issues.apache.org/jira/browse/TAJO-256
>             Project: Tajo
>          Issue Type: New Feature
>          Components: catalog, distributed query plan, parser
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.3-incubating
>
>
> This issue includes follows sub issues
> * SQL support of group by extensions (GROUPING SETS, CUBE, ROLLUP)
> * Query execution of group by extensions
> * GROUPING() function
> * Data cube materialization process
> * Cube schema maintenance
> * Sample-based cost estimation



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to