[ 
https://issues.apache.org/jira/browse/DRILL-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495008#comment-14495008
 ] 

Alexander Zarei commented on DRILL-2767:
----------------------------------------

Update:

So I have been running into the issue and what happens, each time I execute a 
query on a fresh Drill cluster, the query executes to the completion. I can run 
multiple queries in a row on smaller tables such as "nation" from TPC-H. 
However, when a query is executed on a larger table such as lineitem, 
consequent queries will fail.

One temporary solution I found is to stop drillbit on all nodes, kill 
metastore, and start the cluster again. My guess is that resources are being 
kept and not released as soon as the query results are returned and hence the 
subsequent queries fail.

> Fragment error on TPCH Scale Factor 30 on a query that completed successfully 
> previously
> ----------------------------------------------------------------------------------------
>
>                 Key: DRILL-2767
>                 URL: https://issues.apache.org/jira/browse/DRILL-2767
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive
>    Affects Versions: 0.8.0
>         Environment: AWS EMR cluster of three m1.xlarge nodes
>            Reporter: Alexander Zarei
>            Assignee: Venki Korukanti
>         Attachments: drillbitcore1.log, drillbitcore1.out, drillbitcore2.log, 
> drillbitcore2.out, drillbitmaster.out, lineitem table schema .png, 
> second-set-core-1-drillbit.log, second-set-core-2-drillbit.log
>
>
> The following sequence led to the error:
> Executed the query 
> bq. SELECT * FROM `realhive`.`tpch_text_30`.`lineitem`
> and it took about 43 minutes to execute successfully. 
> After ward I ran the query 
> bq. SELECT * FROM `realhive`.`tpch_text_2`.`lineitem`
> for 6 times to find an optimization value for the ODBC driver. 
> Afterward, I submitted the first query again
> bq. SELECT * FROM `realhive`.`tpch_text_30`.`lineitem`
>  
> and the Drill Cluster returned a fragment error.
> bq. ***[HY000]: [MapR][Drill] (1040) Drill failed to execute the query: 
> SELECT * FROM `realhive`.`tpch_text_30`.`lineitem`[30024]Query execution 
> error. Details:[RemoteRpcException: Failure while running fragment.[ 
> fb97e7be-d09e-46fe-8728-9577fd0d8795 on ip-10-12-62-65
> Log files with debug level for the Drillbits on the master node as well as 
> the core nodes of the cluster are attached.
> Also the connection through the ODBC driver on Linux 32 bit was "Direct" to 
> the drillbit on the master node of the Hadoop cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to