[
https://issues.apache.org/jira/browse/DRILL-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Girish updated DRILL-2911:
-----------------------------------
Description:
Random query failures with connection errors were observed when functional
tests were run concurrently:
Error message:
{code:sql}
Query failed: CONNECTION ERROR: Exceeded timeout while waiting send
intermediate work fragments to remote nodes. Sent 4 and only heard response
back from 3 nodes.
{code}
Logs attached.
was:
Drill fails with connection error even when the Drill web UI also shows all
drill-bits to be up. However, some nodes do not list the Drillbit process.
Looks like an inconsistent state.
Queries with simple scans execute successfully:
{code:sql}
select i_item_sk from item limit 5;
+------------+
| i_item_sk |
+------------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------------+
5 rows selected (0.112 seconds)
{code}
Any query which might span across multiple drill-bits fails with connection
error:
{code:sql}
SELECT
*
FROM item i,
inventory inv
WHERE inv.inv_item_sk = i.i_item_sk
LIMIT 10;
Query failed: CONNECTION ERROR: Exceeded timeout while waiting send
intermediate work fragments to remote nodes. Sent 4 and only heard response
back from 3 nodes.
[5ada1a3e-d198-478b-941d-3c9bb917e494 on abhi7.qa.lab:31010]
Error: exception while executing query: Failure while executing query.
(state=,code=0)
{code}
The issue could possibly be due to a previous failed query.
Couldn't find the error code in logs. Have attached logs from all nodes for
reference.
> Queries fail randomly with connection error "Exceeded timeout while waiting
> send intermediate work fragments to remote nodes ..."
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-2911
> URL: https://issues.apache.org/jira/browse/DRILL-2911
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 0.9.0
> Reporter: Abhishek Girish
> Fix For: Future
>
> Attachments: drillbit_node4.log
>
>
> Random query failures with connection errors were observed when functional
> tests were run concurrently:
> Error message:
> {code:sql}
> Query failed: CONNECTION ERROR: Exceeded timeout while waiting send
> intermediate work fragments to remote nodes. Sent 4 and only heard response
> back from 3 nodes.
> {code}
> Logs attached.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)