[ 
https://issues.apache.org/jira/browse/IMPALA-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504016#comment-16504016
 ] 

Tim Armstrong commented on IMPALA-7115:
---------------------------------------

[~philip] yeah I agree, I think choosing a sane default here is the first step 
anyway.

I did a bit of data analysis on some samples of queries that I have access to. 
I can see a small number of queries executing successfully with up to 1338 
fragments on a host. If we assume that there roughly one scan per fragment, 
that means we could set the default THREAD_RESERVATION_LIMIT value to 3000 
without disrupting any of the successful queries in the data set. We don't want 
to make the default too high, since we want it to have some teeth.

> Set a default THREAD_RESERVATION_LIMIT value
> --------------------------------------------
>
>                 Key: IMPALA-7115
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7115
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: resource-management
>
> As a follow on to IMPALA-6035, we should set a default value that actually 
> will help protect again insanely complex queries.
> Motivating discussion is here: 
> https://gerrit.cloudera.org/#/c/10365/9/common/thrift/ImpalaInternalService.thrift
> {quote}
> Tim Armstrong
> 1:11 PM
> Dan suggested setting a default here. I started doing some experiments to see 
> what our current practical limits are.
> On stock Ubuntu 16.04 I start getting thread_resource_error at around 8000 
> reserved threads. I'm not sure that the config reflects what people would use 
> on production systems so continuing to investigate.
> Dan Hecht
> 1:31 PM
> We could also consider choosing a default dynamically based on the OS's 
> setting, if that's necessary.
> Tim Armstrong
> 3:45 PM
> I increased some of the configs (I think I was limited by 
> /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max == 12288) and now it 
> got oom-killed at ~26000 threads.
> I think unfortunately there are a lot of different OS knobs that impact this 
> and they seem to evolve over time, so it's probably not feasible with a 
> reasonable amount of effort to get it working on all common Linux distros.
> I was thinking ~5000, since 1000-2000 plan nodes is the most I've seen for a 
> query running successfully in production.
> Maybe I should do this in a follow-on change, since we probably also want to 
> add a test query at or near this limit.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to