[ 
https://issues.apache.org/jira/browse/TAJO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806313#comment-13806313
 ] 

Jihoon Son commented on TAJO-266:
---------------------------------

For this issue, I designed a new class called ExecutionPlan.
An ExecutionPlan is a DAG which consists of LogicalNodes and their connections. 
Each connection represents a data flow between LogicalNodes.
Each ExecutionBlock contains an ExecutionPlan instead of a LogicalPlan.
When a master executes an ExecutionBlock, it sends an ExecutionPlan of the 
ExecutionBlock to tasks.
After that, each task generates a PhysicalPlan from the given ExecutionPlan.
Here, I added two PhysicalNodes, called PhysicalRootExec and MultiOutExec, to 
support multiple outputs while preserving the pipelined query execution 
structure.
PhysicalRootExec is just used to represent the root of the physical plan. 
MultiOutExec receives an integer n as an argument of the constructor.
When a next() is called, MultiOutExec returns the same tuple n times.

I attached figures to help you better understand. These figures show a 
comparison between the current master plan and a master plan optimized by the 
YSmart algorithm (see TAJO-161).

While this structure looks little complicated, it can support various master 
plan optimization such as TAJO-161.
So, based on this structure, I think that we can develop a new master plan 
optimizer and optimization rules which can significantly improve the query 
processing performance.

Please give any advice.
Thanks.

> Extend ExecutionBlock and Task to support multiple outputs
> ----------------------------------------------------------
>
>                 Key: TAJO-266
>                 URL: https://issues.apache.org/jira/browse/TAJO-266
>             Project: Tajo
>          Issue Type: Task
>          Components: distributed query plan, worker
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>
> In the current Tajo, every task has the only one output.
> However, supporting multiple outputs per task very useful for the distributed 
> plan optimization.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to