[
https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446632#comment-16446632
]
Thai Bui commented on HIVE-17481:
---------------------------------
[~prasanth_j] Queries are moved from one queue to another based on predefined
triggers now. I do see guaranteed tasks in the log as well. The only think I am
not seeing is tasks (fragments) getting preempted in Tez AM debug log. I
suspect there's something going on and will have to dig a bit deeper on this.
Another strange behavior is the number of running tasks (in Hiveserver2),
getting increased as there's more concurrent queries. For example, let's stay
that big query 1 has 48 running tasks out of 1000. The HS2 log will look like
so:
{noformat}
2018-04-21T04:40:14,671 INFO [Thread-303]
monitoring.RenderStrategy$LogToFileFunction: Map 1: 5(+48)/1000 Reducer 2:
0/2017 Reducer 3: 0/1
...
2018-04-21T04:40:14,671 INFO [Thread-303]
monitoring.RenderStrategy$LogToFileFunction: Map 1: 100(+48)/1000 Reducer
2: 0/2017 Reducer 3: 0/1
{noformat}
Then, when another query is submitted, the number of running tasks (+48) slowly
creep up
{noformat}
2018-04-21T04:40:14,671 INFO [Thread-303]
monitoring.RenderStrategy$LogToFileFunction: Map 1: 10(+16)/100 Reducer 2:
0/200 Reducer 3: 0/1 <~~~~~~~ a parallel query submitted
2018-04-21T04:40:14,671 INFO [Thread-303]
monitoring.RenderStrategy$LogToFileFunction: Map 1: 200(+52)/1000 Reducer
2: 0/2017 Reducer 3: 0/1 <~~~~~~~ becomes +52
..
2018-04-21T04:40:14,671 INFO [Thread-303]
monitoring.RenderStrategy$LogToFileFunction: Map 1: 200(+62)/1000 Reducer
2: 0/2017 Reducer 3: 0/1 <~~~~~~~ becomes +62 then goes up higher and
higher
{noformat}
When this happens, the numbers of running tasks get really high really fast,
this makes smaller queries really slow even though we have WM configured.
Since I have `hive.llap.task.scheduler.num.schedulable.tasks.per.node` fixed to
a preconfigured number, I expect those running tasks not to get that high. Do
you know how those running tasks get assigned and become that high?
> LLAP workload management
> ------------------------
>
> Key: HIVE-17481
> URL: https://issues.apache.org/jira/browse/HIVE-17481
> Project: Hive
> Issue Type: New Feature
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: Workload management design doc.pdf
>
>
> This effort is intended to improve various aspects of cluster sharing for
> LLAP. Some of these are applicable to non-LLAP queries and may later be
> extended to all queries. Administrators will be able to specify and apply
> policies for workload management ("resource plans") that apply to the entire
> cluster, with only one resource plan being active at a time. The policies
> will be created and modified using new Hive DDL statements.
> The policies will cover:
> * Dividing the cluster into a set of (optionally, nested) query pools that
> are each allocated a fraction of the cluster, a set query parallelism,
> resource sharing policy between queries, and potentially others like
> priority, etc.
> * Mapping the incoming queries into pools based on the query user, groups,
> explicit configuration, etc.
> * Specifying rules that perform actions on queries based on counter values
> (e.g. killing or moving queries).
> One would also be able to switch policies on a live cluster without (usually)
> affecting running queries, including e.g. to change policies for daytime and
> nighttime usage patterns, and other similar scenarios. The switches would be
> safe and atomic; versioning may eventually be supported.
> Some implementation details:
> * WM will only be supported in HS2 (for obvious reasons).
> * All LLAP query AMs will run in "interactive" YARN queue and will be
> fungible between Hive pools.
> * We will use the concept of "guaranteed tasks" (also known as ducks) to
> enforce cluster allocation without a central scheduler and without
> compromising throughput. Guaranteed tasks preempt other (speculative) tasks
> and are distributed from HS2 to AMs, and from AMs to tasks, in accordance
> with percentage allocations in the policy. Each "duck" corresponds to a CPU
> resource on the cluster. The implementation will be isolated so as to allow
> different ones later.
> * In future, we may consider improved task placement and late binding,
> similar to the ones described in Sparrow paper, to work around potential
> hotspots/etc. that are not avoided with the decentralized scheme.
> * Only one HS2 will initially be supported to avoid split-brain workload
> management. We will also implement (in a tangential set of work items)
> active-passive HS2 recovery. Eventually, we intend to switch to full
> active-active HS2 configuration with shared WM and Tez session pool (unlike
> the current case with 2 separate session pools).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)