[jira] [Comment Edited] (TAJO-603) Move container allocation from SubQuery down to QueryUnitAttempt

Min Zhou (JIRA) Sun, 16 Feb 2014 22:00:07 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902982#comment-13902982
 ]


Min Zhou edited comment on TAJO-603 at 2/17/14 5:58 AM:
--------------------------------------------------------

Hi Jihoon, 

Thank you giving me the background around this.  I got your point. However, I 
think it's not necessary a bundle of tasks sharing one container. Here is my 
reason.

1. For standalone mode, each container is actually a thread. It's fast enough, 
no need to share. 
2. For yarn mode. I think in the future, we might support 2 types of yarn mode. 
   One is as spark/impala does,  yarn can start a tajo cluster, resouce is 
allocated for tajo daemons like tajo master or worker.  You can consider this 
type of clusters as dedicated clusters, only used for low latency SQL queries.
    The other type is like current implementation, share resources with other 
applications like storm, samza, spark or mapreduce jobs. You can consider tajo 
queries in this type as a replacement of hive ETL jobs. They are always not 
very speed sensitive, and the data volume should be very large, thus the job 
always need minutes of time. I think for this kind of job,  as fast as Tez is 
enough.  The overhead of yarn scheduling is quite light. 

>From the reason above, why not make the code simpler than before?

How do you think, Jihoon?

Regards,
Min


was (Author: coderplay):
Hi Jihoon, 

Thank you giving me the background around this.  I got your point. However, I 
think it's not necessary a bundle of tasks sharing one container. Here is my 
reason.

1. For standalone mode, each container is actually a thread. It's fast enough, 
no need to share. 
2. For yarn mode. I think in the future, we might support 2 types of yarn mode. 
   One is as spark/impala does,  yarn can start a tajo cluster, resouce is 
allocated for tajo daemons like tajo master or worker.  You can consider this 
type of clusters as dedicated clusters, only used for low latency SQL queries.
    The other type is like current implementation, share resources with other 
applications like storm, samza, spark or mapreduce jobs. You can consider tajo 
queries in this type as a replacement of hive ETL jobs. They are always not 
very speed sensitive, and the data volume should be very large, thus the job 
always need minutes of time. I think for this kind of job,  as fast as Tez is 
enough.  The overhead of yarn scheduling is quite light. 

>From the reason above, why not make the code simpler than before?

Regards,
Min

> Move container allocation from SubQuery down to QueryUnitAttempt
> ----------------------------------------------------------------
>
>                 Key: TAJO-603
>                 URL: https://issues.apache.org/jira/browse/TAJO-603
>             Project: Tajo
>          Issue Type: Sub-task
>            Reporter: Min Zhou
>             Fix For: 1.0-incubating
>
>         Attachments: schedule.png
>
>
> Tajo currently allocates all of the containers in SubQuery. That make things 
> complicated.  Both SubQuery and DefaultTaskScheduler should hold a copy of 
> allocated containers and running tasks.  And the event flow is difficult to 
> understand. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (TAJO-603) Move container allocation from SubQuery down to QueryUnitAttempt

Reply via email to