[ https://issues.apache.org/jira/browse/DRILL-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Girish updated DRILL-2911: ----------------------------------- Summary: Queries fail randomly with connection error "Exceeded timeout while waiting send intermediate work fragments to remote nodes ..." (was: Queries fail with connection error when some Drillbit processes are down) > Queries fail randomly with connection error "Exceeded timeout while waiting > send intermediate work fragments to remote nodes ..." > --------------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-2911 > URL: https://issues.apache.org/jira/browse/DRILL-2911 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 0.9.0 > Reporter: Abhishek Girish > Fix For: Future > > Attachments: drillbit_node1.log, drillbit_node2.log, > drillbit_node3.log, drillbit_node4.log > > > Drill fails with connection error even when the Drill web UI also shows all > drill-bits to be up. However, some nodes do not list the Drillbit process. > Looks like an inconsistent state. > Queries with simple scans execute successfully: > {code:sql} > select i_item_sk from item limit 5; > +------------+ > | i_item_sk | > +------------+ > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > +------------+ > 5 rows selected (0.112 seconds) > {code} > Any query which might span across multiple drill-bits fails with connection > error: > {code:sql} > SELECT > * > FROM item i, > inventory inv > WHERE inv.inv_item_sk = i.i_item_sk > LIMIT 10; > Query failed: CONNECTION ERROR: Exceeded timeout while waiting send > intermediate work fragments to remote nodes. Sent 4 and only heard response > back from 3 nodes. > [5ada1a3e-d198-478b-941d-3c9bb917e494 on abhi7.qa.lab:31010] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > The issue could possibly be due to a previous failed query. > Couldn't find the error code in logs. Have attached logs from all nodes for > reference. -- This message was sent by Atlassian JIRA (v6.3.15#6346)