[ https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441379#comment-16441379 ]
Thai Bui commented on HIVE-17481: --------------------------------- [~prasanth_j] From reading the code and just general digging around, these 3 settings seem to fix the guaranteed tasks issue from my end, the smaller queries now take regular to just a bit slower when there's a big query running. {noformat} "hive.llap.task.scheduler.num.schedulable.tasks.per.node": "6", "hive.llap.task.scheduler.preempt.independent": "true", "llap.plugin.endpoint.enabled": "true", {noformat} I force the number of schedulable tasks to 6 to match the numbers of executor on each LLAP daemon. > LLAP workload management > ------------------------ > > Key: HIVE-17481 > URL: https://issues.apache.org/jira/browse/HIVE-17481 > Project: Hive > Issue Type: New Feature > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Major > Fix For: 3.0.0 > > Attachments: Workload management design doc.pdf > > > This effort is intended to improve various aspects of cluster sharing for > LLAP. Some of these are applicable to non-LLAP queries and may later be > extended to all queries. Administrators will be able to specify and apply > policies for workload management ("resource plans") that apply to the entire > cluster, with only one resource plan being active at a time. The policies > will be created and modified using new Hive DDL statements. > The policies will cover: > * Dividing the cluster into a set of (optionally, nested) query pools that > are each allocated a fraction of the cluster, a set query parallelism, > resource sharing policy between queries, and potentially others like > priority, etc. > * Mapping the incoming queries into pools based on the query user, groups, > explicit configuration, etc. > * Specifying rules that perform actions on queries based on counter values > (e.g. killing or moving queries). > One would also be able to switch policies on a live cluster without (usually) > affecting running queries, including e.g. to change policies for daytime and > nighttime usage patterns, and other similar scenarios. The switches would be > safe and atomic; versioning may eventually be supported. > Some implementation details: > * WM will only be supported in HS2 (for obvious reasons). > * All LLAP query AMs will run in "interactive" YARN queue and will be > fungible between Hive pools. > * We will use the concept of "guaranteed tasks" (also known as ducks) to > enforce cluster allocation without a central scheduler and without > compromising throughput. Guaranteed tasks preempt other (speculative) tasks > and are distributed from HS2 to AMs, and from AMs to tasks, in accordance > with percentage allocations in the policy. Each "duck" corresponds to a CPU > resource on the cluster. The implementation will be isolated so as to allow > different ones later. > * In future, we may consider improved task placement and late binding, > similar to the ones described in Sparrow paper, to work around potential > hotspots/etc. that are not avoided with the decentralized scheme. > * Only one HS2 will initially be supported to avoid split-brain workload > management. We will also implement (in a tangential set of work items) > active-passive HS2 recovery. Eventually, we intend to switch to full > active-active HS2 configuration with shared WM and Tez session pool (unlike > the current case with 2 separate session pools). -- This message was sent by Atlassian JIRA (v7.6.3#76005)