[ 
https://issues.apache.org/jira/browse/IMPALA-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6847.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> Consider adding workaround for high memory estimates in admission control
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-6847
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6847
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Product Backlog
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: admission-control, resource-management
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> A scenario we've seen play out a couple of times is this.
> 1. An Impala admin sets up memory-based admission control but sets the 
> default query mem_limit to 0. This means that admission control will use 
> memory estimates instead of mem_limit. Typically admins want some protection 
> from large queries consuming excessive memory but can't or don't want to set 
> a single mem_limit because the workload is unknown or unpredictable. This 
> configuration has caveats and we recommend against it, but this happens and 
> often works well enough as long as the workload is comprised of relatively 
> simple queries. The caveats include:
> * There is no enforcement that a query stays within the memory estimate. This 
> means that a query can fail or force other queries to fail.
> * Memory estimates are often inaccurate (this is unavoidable since they 
> depend on cardinality estimates, which are commonly off by 10x even with 
> state-of-the-art query planners). This means that runnable queries may be 
> impossible to admit without setting a mem_limit.
> 2. Something changes about the workload, e.g. a new query is added, stats are 
> computed, data size changes, we make an otherwise-innocuous planner change. 
> Then we run into the second problem above and one or more queries cannot be 
> executed, e.g. "Rejected query from pool root.foo: request memory needed 
> 1234.56 GB is greater than pool max mem resources 1000.00 GB. Use the 
> MEM_LIMIT query option to indicate how much memory is required per node. The 
> total memory needed is the per-node MEM_LIMIT times the number of nodes 
> executing the query. See the Admission Control documentation for more 
> information. "
> The last problem is the problem this JIRA is intending to work around. The 
> preferred solutions, which work most of the time, are:
> # Configure a default query memory limit for the pool
> # Set a mem_limit for the query
> # Disable memory-based admission control and do admission control based on 
> num_queries, which is simpler and easier to understand.
> It would however, be useful to have a third fallback in cases where the first 
> two options are difficult or impossible to apply. The basic requirement is to 
> allow an administrator to configure a resource pool such that queries with 
> very high estimates can run. We probably need enough flexibility that they 
> can run concurrently with other smaller queries (i.e. the big query shouldn't 
> take over the whole pool) and ideally we would also have a mem_limit applied 
> to the query so that we're still protected from runaway memory consumption.
> The long-term solution to this is IMPALA-6460. This is a short-term 
> workaround that can tide users over until we have a comprehensive solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to