Khurram Faraaz created DRILL-5576: ------------------------------------- Summary: OutOfMemoryException when some CPU cores are taken offline while concurrent queries are under execution Key: DRILL-5576 URL: https://issues.apache.org/jira/browse/DRILL-5576 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.11.0 Environment: 3 nodes CentOS cluster Reporter: Khurram Faraaz
When we reduce the number of available CPU cores while concurrent queries are under execution we see an OOM. Drill 1.11.0 commit ID: d11aba2 three node CentOS 6.8 cluster On each of the nodes Drill's direct memory was set to export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"16G"} There are 24 cores on the node where foreman Drillbit is under execution. {noformat} [root@centos-01 logs]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0,2,4,5,8,9,12,14,15,18,20,22 Off-line CPU(s) list: 1,3,6,7,10,11,13,16,17,19,21,23 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Model name: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz Stepping: 2 CPU MHz: 1600.000 BogoMIPS: 4799.86 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0,2,4,5,12,14,15 NUMA node1 CPU(s): 8,9,18,20,22 {noformat} Java code snippet that creates threads and executes TPC-DS query 11 concurrently {noformat} ExecutorService executor = Executors.newFixedThreadPool(48); try { for (int i = 1; i <= 48; i++) { executor.submit(new ConcurrentQuery(conn)); } } catch (Exception e) { System.out.println(e.getMessage()); e.printStackTrace(); } {noformat} While the TPC-DS Query 11 is under execution using above program, we take half of the available CPU cores offline {noformat} [root@centos-01 ~]# sh turnCPUCoresOffline.sh OFFLINE cores are : 1,3,6-7,10-11,13,16-17,19,21,23 ONLINE cores are : 0,2,4-5,8-9,12,14-15,18,20,22 {noformat} The result is we see an OutOfMemoryException, drillbit.log files are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)