Abhishek Girish created DRILL-2911:
--------------------------------------

             Summary: Queries fail with connection error when some Drillbit 
processes are down
                 Key: DRILL-2911
                 URL: https://issues.apache.org/jira/browse/DRILL-2911
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 0.9.0
            Reporter: Abhishek Girish
            Assignee: Chris Westin


Drill fails with connection error even when the Drill web UI also shows all 
drill-bits to be up. However, some nodes do not list the Drillbit process. 
Looks like an inconsistent state. 

Queries with simple scans execute successfully:
{code:sql}
select i_item_sk from item limit 5;
+------------+
| i_item_sk  |
+------------+
| 1          |
| 2          |
| 3          |
| 4          |
| 5          |
+------------+
5 rows selected (0.112 seconds)
{code}

Any query which might span across multiple drill-bits fails with connection 
error:
{code:sql}
SELECT 
* 
FROM     item i, 
                inventory inv
WHERE        inv.inv_item_sk = i.i_item_sk 
LIMIT 10;

Query failed: CONNECTION ERROR: Exceeded timeout while waiting send 
intermediate work fragments to remote nodes.  Sent 4 and only heard response 
back from 3 nodes.
[5ada1a3e-d198-478b-941d-3c9bb917e494 on abhi7.qa.lab:31010]
Error: exception while executing query: Failure while executing query. 
(state=,code=0)
{code}

The issue could possibly be due to a previous failed query.  

Couldn't find the error code in logs. Have attached logs from all nodes for 
reference. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to