[
https://issues.apache.org/jira/browse/TAJO-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381752#comment-14381752
]
Jihoon Son commented on TAJO-256:
---------------------------------
Currently, only the grammar part is implmented. You can see it at SQLParser.g4.
The remaining parts are logical planning, global planning, physical planning,
and query execution.
For the planning part, there are some codes which I wrote long time ago.
IMO, it would be better to start from the beginning. Here are some reaons.
As you may know, the naive algorithm for the cube operation is the consecutive
multiple group-bys for every combination of aggregation keys. Since this naive
method incurs the huge overhead, we should find a better solution.
As commented above, I tryied to resolve this problem by sharing common group-by
results. In addition, to represent sharing data between group-by plans, I tried
to extend Tajo's query plan from a Tree form to a DAG form (TAJO-266). This
work is contained in a separate branch, called DAG-execplan. However, that
branch has not been maintained for a long time. In addition, I'm not sure about
this approach anymore. I think that there will be a better and much easier
solution.
Interestingly, some papers have recently been published for efficient execution
of cube operation in distributed systems. I think we should survey those
materials.
> Support data cube (Umbrella)
> ----------------------------
>
> Key: TAJO-256
> URL: https://issues.apache.org/jira/browse/TAJO-256
> Project: Tajo
> Issue Type: New Feature
> Components: catalog, distributed query plan, parser
> Reporter: Jihoon Son
> Assignee: Jihoon Son
>
> This issue includes follows sub issues
> * SQL support of group by extensions (GROUPING SETS, CUBE, ROLLUP)
> * Query execution of group by extensions
> * GROUPING() function
> * Data cube materialization process
> * Cube schema maintenance
> * Sample-based cost estimation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)