[
https://issues.apache.org/jira/browse/HIVE-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mustafa Iman reassigned HIVE-23744:
-----------------------------------
> Reduce query startup latency
> ----------------------------
>
> Key: HIVE-23744
> URL: https://issues.apache.org/jira/browse/HIVE-23744
> Project: Hive
> Issue Type: Task
> Components: llap
> Affects Versions: 4.0.0
> Reporter: Mustafa Iman
> Assignee: Mustafa Iman
> Priority: Major
> Attachments: am_schedule_and_transmit.png, task_start.png
>
>
> When I run queries with large number of tasks for a single vertex, I see a
> significant delay before all tasks start execution in llap daemons.
> Although llap daemons have the free capacity to run the tasks, it takes a
> significant time to schedule all the tasks in AM and actually transmit them
> to executors.
> "am_schedule_and_transmit" shows scheduling of tasks of tpcds query 55. It
> shows only the tasks scheduled for one of 10 llap daemons. The scheduler
> works in a single thread, scheduling tasks one by one. A delay in scheduling
> of one task, delays all the tasks.
> !am_schedule_and_transmit.png|width=831,height=573!
>
> Another issue is that it takes long time to fill all the execution slots in
> llap daemons even though they are all empty initially. This is caused by
> LlapTaskCommunicator using a fixed number of threads (10 by default) to send
> the tasks to daemons. Also this communication is synchronized so these
> threads block communication staying idle. "task_start.png" shows running
> tasks on an llap daemon that has 12 execution slots. By the time 12th task
> starts running, more than 100ms already passes. That slot stays idle all this
> time.
> !task_start.png|width=1166,height=635!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)