[ 
https://issues.apache.org/jira/browse/DRILL-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135300#comment-15135300
 ] 

Hanifi Gunes commented on DRILL-4325:
-------------------------------------

Culprit: over parallelization and excessive context switching. 

[~vicky] did a great job locally reproducing a very similar scenario in house. 
The setup we had here constantly hammers a single foreman via concurrent 
queries. She collected sar reports for cpu, context switching, io logs within 
few seconds window. Not surprisingly, some nodes eventually disconnected from 
Zookeeper. Inspecting, drill & zk logs & stats around the time window 
disconnection happens, we were able to identify huge peek at number of context 
switches ~90000/second whereas there is no processes switch taking place. That 
literally means to me OS is too busy at the time context switching between 
Drill's too many threads. In the same interval, that runs about 40-50 seconds, 
number of bytes read from network is zero while number of bytes written is at 
the order of few kilobytes. Keep in mind that these numbers are per OS and not 
for Drill. So within that window where crazy context switching happens, no 
process reads from the network -- likely IO threads are starving. Looking at zk 
logs session terminates after 40 secs while this excessive context switching 
lasts about a minute which seems enough for zookeeper to believe its 
counterpart is dead.

Not sure if there is an immediate fix for this but the proposed resource 
manager should heal this situation.

One immediate improvement point I can see is to eliminate possible causes of 
imbalance. For instance, DrillClient should not stick to a single bit as a 
foreman. Current way of picking a random node is not enough. Instead we should 
retire the connection with a timeout mechanism or after a certain number of 
queries are served , reconnecting to another node.

> ForemanException: One or more nodes lost connectivity during query
> ------------------------------------------------------------------
>
>                 Key: DRILL-4325
>                 URL: https://issues.apache.org/jira/browse/DRILL-4325
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.5.0
>            Reporter: Victoria Markman
>         Attachments: drillbit.log.133, drillbit.log.134, drillbit.log.135, 
> drillbit.log.136, stats.133.tar, stats.134.tar, stats.135.tar, stats.136.tar, 
> zookeeper.log
>
>
> The picture pretty much looks like this: bunch of queries are running 
> (usually something more involved than just simple functional tests),usually  
> tpch or tpcds  with lots of major fragments, like query74 from tpcds . 
> Zookeeper decides that particular node is dead and queries that were running 
> at the time of the connection loss are failed by drill ( which is correct 
> behavior, I think )
> It seems that I can reliably reproduce this issue when I bump up number of 
> concurrently running queries and make all of them go to the same forman node 
> (I don't really imply here that  planning is to blame, just seems to 
> reproduce easier)
> On my 4 node cluster I can pretty much reproduce this problem relaiably by 
> running: 
> run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke -t 600 -n 10 
> {code}
> 2016-01-28 16:30:20,146 [29554d63-b478-6bae-f0f6-435d9f33ffdf:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d63-b478-6bae-f0f6-435d9f33ffdf: select * from sys.version
> 2016-01-28 16:30:22,844 [29554d61-2789-babb-54e5-22b701bf2f64:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d61-2789-babb-54e5-22b701bf2f64: select * from sys.drillbits
> 2016-01-28 16:30:23,281 [29554d60-5bbd-dae1-c38d-21708ad37fbe:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d60-5bbd-dae1-c38d-21708ad37fbe: alter system set 
> `planner.enable_decimal_data_type` = true
> 2016-01-28 16:30:24,889 [29554d5e-d243-6299-3103-58b180135854:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-d243-6299-3103-58b180135854: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,931 [29554d5e-b395-14aa-42a4-f6f248059363:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-b395-14aa-42a4-f6f248059363: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,964 [29554d5f-24ac-cf00-714c-7419d3894af0:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5f-24ac-cf00-714c-7419d3894af0: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:24,998 [29554d5e-ae92-6306-3495-be5cb7f98139:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-ae92-6306-3495-be5cb7f98139: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,040 [29554d5e-1a20-3d6d-143b-0ee3bcd4aa11:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-1a20-3d6d-143b-0ee3bcd4aa11: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,073 [29554d5d-e7b4-c61c-9735-ce37938aa47d:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-e7b4-c61c-9735-ce37938aa47d: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,106 [29554d5d-823b-0536-e4df-4c6cef64b3e4:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-823b-0536-e4df-4c6cef64b3e4: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,131 [29554d5e-099c-3acd-477e-ee4bece4dc4e:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-099c-3acd-477e-ee4bece4dc4e: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,184 [29554d5d-b87b-fadd-d5bb-a5d0bba03671:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-b87b-fadd-d5bb-a5d0bba03671: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,205 [29554d5d-97a8-c577-76b7-01bb6c5f5e48:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-97a8-c577-76b7-01bb6c5f5e48: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:30:25,432 [29554d5e-353a-27cf-9c93-969f9e8866da:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-353a-27cf-9c93-969f9e8866da: -- start query 55 in stream 0 using 
> template query55.tpl 
> 2016-01-28 16:30:25,509 [29554d5d-c872-a1be-8386-bf5500ba6726:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-c872-a1be-8386-bf5500ba6726: -- start query 76 in stream 0 using 
> template query76.tpl 
> 2016-01-28 16:30:25,564 [29554d5d-a2ad-bb2a-cfd3-0ec498ef4da9:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-a2ad-bb2a-cfd3-0ec498ef4da9: -- start query 46 in stream 0 using 
> template query46.tpl 
> 2016-01-28 16:30:25,665 [29554d5d-9ffe-96e8-7b4b-0e4214554185:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-9ffe-96e8-7b4b-0e4214554185: -- start query 21 in stream 0 using 
> template query21.tpl 
> 2016-01-28 16:30:25,714 [29554d5d-a051-b35c-078f-2872372aa7bb:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-a051-b35c-078f-2872372aa7bb: -- start query 34 in stream 0 using 
> template query34.tpl 
> 2016-01-28 16:30:25,738 [29554d5d-9dd8-3477-9ce0-ac5f3323195d:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5d-9dd8-3477-9ce0-ac5f3323195d: -- start query 68 in stream 0 using 
> template query68.tpl 
> 2016-01-28 16:30:25,861 [29554d5e-2d2e-81bb-4427-696d27f8deb5:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-2d2e-81bb-4427-696d27f8deb5: -- start query 74 in stream 0 using 
> template query74.tpl 
> 2016-01-28 16:30:25,910 [29554d5e-71d6-5cf5-3ec6-58d07ce81ed5:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-71d6-5cf5-3ec6-58d07ce81ed5: -- start query 33 in stream 0 using 
> template query33.tpl 
> 2016-01-28 16:30:26,012 [29554d5e-4fab-a069-ea09-5d1c561664fe:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5e-4fab-a069-ea09-5d1c561664fe: -- start query 50 in stream 0 using 
> template query50.tpl 
> 2016-01-28 16:30:26,047 [29554d5c-d2ce-153d-4f5b-a8c26f3d256d:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554d5c-d2ce-153d-4f5b-a8c26f3d256d: -- start query 52 in stream 0 using 
> template query52.tpl 
> 2016-01-28 16:32:43,453 [29554cd4-40a8-101a-b418-a91898d720b6:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554cd4-40a8-101a-b418-a91898d720b6: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:32:44,303 [29554cd2-8fe4-9ca4-2b3f-3d591f2144c4:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554cd2-8fe4-9ca4-2b3f-3d591f2144c4: -- start query 91 in stream 0 using 
> template query91.tpl 
> 2016-01-28 16:32:47,965 [29554ccf-db1d-c69b-7dc5-703c1a03f623:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554ccf-db1d-c69b-7dc5-703c1a03f623: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:32:49,119 [29554cce-68d0-6ec4-cbb9-192430099659:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554cce-68d0-6ec4-cbb9-192430099659: -- start query 59 in stream 0 using 
> template query59.tpl 
> 2016-01-28 16:33:01,153 [29554cc1-9ad9-902d-ea5e-f8520b77bd8a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554cc1-9ad9-902d-ea5e-f8520b77bd8a: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:33:02,119 [29554cc1-74cc-89e8-cbb9-a0c5961d1018:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554cc1-74cc-89e8-cbb9-a0c5961d1018: -- start query 3 in stream 0 using 
> template query3.tpl 
> 2016-01-28 16:35:31,837 [29554c2b-9ebc-9509-96fd-a504657a516f:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554c2b-9ebc-9509-96fd-a504657a516f: use `dfs.tpcds_sf100_parquet_views`
> 2016-01-28 16:35:34,231 [29554c29-0e51-3d3c-dfa5-358d67233d9b:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 29554c29-0e51-3d3c-dfa5-358d67233d9b: -- start query 66 in stream 0 using 
> template query66.tpl 
> 2016-01-28 16:36:13,623 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:3:74.
> 2016-01-28 16:36:13,624 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:6:18.
> 2016-01-28 16:36:13,636 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:4:38.
> 2016-01-28 16:36:13,663 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:6:50.
> 2016-01-28 16:36:13,664 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9dd8-3477-9ce0-ac5f3323195d:1:2.
> 2016-01-28 16:36:13,664 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-a2ad-bb2a-cfd3-0ec498ef4da9:6:30.
> 2016-01-28 16:36:13,665 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:1:10.
> 2016-01-28 16:36:13,665 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:6:6.
> 2016-01-28 16:36:13,665 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9ffe-96e8-7b4b-0e4214554185:3:22.
> 2016-01-28 16:36:13,666 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:3:86.
> 2016-01-28 16:36:13,666 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9ffe-96e8-7b4b-0e4214554185:1:46.
> 2016-01-28 16:36:13,666 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:3:30.
> 2016-01-28 16:36:13,666 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:7:34.
> 2016-01-28 16:36:13,667 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-c872-a1be-8386-bf5500ba6726:1:50.
> 2016-01-28 16:36:13,667 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-a051-b35c-078f-2872372aa7bb:1:10.
> 2016-01-28 16:36:13,668 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:7:10.
> 2016-01-28 16:36:13,669 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:3:90.
> 2016-01-28 16:36:13,670 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:22:41.
> 2016-01-28 16:36:13,670 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:6:26.
> 2016-01-28 16:36:13,670 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:3:18.
> 2016-01-28 16:36:13,671 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:20:37.
> 2016-01-28 16:36:13,671 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-c872-a1be-8386-bf5500ba6726:1:10.
> 2016-01-28 16:36:13,767 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:5:2.
> 2016-01-28 16:36:13,768 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9ffe-96e8-7b4b-0e4214554185:1:14.
> 2016-01-28 16:36:13,768 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:5:54.
> 2016-01-28 16:36:13,768 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:5:86.
> 2016-01-28 16:36:13,769 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:3:62.
> 2016-01-28 16:36:13,812 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:3:82.
> 2016-01-28 16:36:13,823 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:8:29.
> 2016-01-28 16:36:13,862 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-c872-a1be-8386-bf5500ba6726:1:86.
> 2016-01-28 16:36:13,950 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9ffe-96e8-7b4b-0e4214554185:3:6.
> 2016-01-28 16:36:13,991 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9ffe-96e8-7b4b-0e4214554185:1:66.
> 2016-01-28 16:36:13,992 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9dd8-3477-9ce0-ac5f3323195d:3:62.
> 2016-01-28 16:36:13,992 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:15:13.
> 2016-01-28 16:36:13,992 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:8:2.
> 2016-01-28 16:36:13,993 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:1:54.
> 2016-01-28 16:36:13,993 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:5:30.
> 2016-01-28 16:36:13,993 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-a2ad-bb2a-cfd3-0ec498ef4da9:6:34.
> 2016-01-28 16:36:13,993 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-9dd8-3477-9ce0-ac5f3323195d:6:38.
> 2016-01-28 16:36:13,994 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:6:34.
> 2016-01-28 16:36:14,095 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5d-a2ad-bb2a-cfd3-0ec498ef4da9:3:54.
> 2016-01-28 16:36:14,155 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:6:74.
> 2016-01-28 16:36:14,155 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-2d2e-81bb-4427-696d27f8deb5:6:46.
> 2016-01-28 16:36:14,155 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:2:2.
> 2016-01-28 16:36:14,156 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554cce-68d0-6ec4-cbb9-192430099659:3:34.
> 2016-01-28 16:36:14,156 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:0:0.
> 2016-01-28 16:36:14,157 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-133.qa.lab no longer 
> active.  Cancelling fragment 29554d5e-4fab-a069-ea09-5d1c561664fe:3:54.
> 2016-01-28 16:36:14,367 [Curator-ServiceCache-0] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: ForemanException: One 
> more more nodes lost connectivity during query.  Identified nodes were 
> [atsqa4-133.qa.lab:31010].
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> ForemanException: One more more nodes lost connectivity during query.  
> Identified nodes were [atsqa4-133.qa.lab:31010].
>       at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:746)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:858)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:790)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:792)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:909) 
> [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:110) 
> [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1183)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: One more more 
> nodes lost connectivity during query.  Identified nodes were 
> [atsqa4-133.qa.lab:31010].
> {code}
> Will post logs shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to