[
https://issues.apache.org/jira/browse/TAJO-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902982#comment-13902982
]
Min Zhou edited comment on TAJO-603 at 2/17/14 5:58 AM:
--------------------------------------------------------
Hi Jihoon,
Thank you giving me the background around this. I got your point. However, I
think it's not necessary a bundle of tasks sharing one container. Here is my
reason.
1. For standalone mode, each container is actually a thread. It's fast enough,
no need to share.
2. For yarn mode. I think in the future, we might support 2 types of yarn mode.
One is as spark/impala does, yarn can start a tajo cluster, resouce is
allocated for tajo daemons like tajo master or worker. You can consider this
type of clusters as dedicated clusters, only used for low latency SQL queries.
The other type is like current implementation, share resources with other
applications like storm, samza, spark or mapreduce jobs. You can consider tajo
queries in this type as a replacement of hive ETL jobs. They are always not
very speed sensitive, and the data volume should be very large, thus the job
always need minutes of time. I think for this kind of job, as fast as Tez is
enough. The overhead of yarn scheduling is quite light.
>From the reason above, why not make the code simpler than before?
How do you think, Jihoon?
Regards,
Min
was (Author: coderplay):
Hi Jihoon,
Thank you giving me the background around this. I got your point. However, I
think it's not necessary a bundle of tasks sharing one container. Here is my
reason.
1. For standalone mode, each container is actually a thread. It's fast enough,
no need to share.
2. For yarn mode. I think in the future, we might support 2 types of yarn mode.
One is as spark/impala does, yarn can start a tajo cluster, resouce is
allocated for tajo daemons like tajo master or worker. You can consider this
type of clusters as dedicated clusters, only used for low latency SQL queries.
The other type is like current implementation, share resources with other
applications like storm, samza, spark or mapreduce jobs. You can consider tajo
queries in this type as a replacement of hive ETL jobs. They are always not
very speed sensitive, and the data volume should be very large, thus the job
always need minutes of time. I think for this kind of job, as fast as Tez is
enough. The overhead of yarn scheduling is quite light.
>From the reason above, why not make the code simpler than before?
Regards,
Min
> Move container allocation from SubQuery down to QueryUnitAttempt
> ----------------------------------------------------------------
>
> Key: TAJO-603
> URL: https://issues.apache.org/jira/browse/TAJO-603
> Project: Tajo
> Issue Type: Sub-task
> Reporter: Min Zhou
> Fix For: 1.0-incubating
>
> Attachments: schedule.png
>
>
> Tajo currently allocates all of the containers in SubQuery. That make things
> complicated. Both SubQuery and DefaultTaskScheduler should hold a copy of
> allocated containers and running tasks. And the event flow is difficult to
> understand.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)