[ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285187#comment-15285187
 ] 

Julian Hyde commented on DRILL-4678:
------------------------------------

>From the stack it looks as if it is not hung, but is hitting a performance bug 
>evaluating metadata (the estimate of the number of distinct rows). I suspect 
>something is O(n ^ 2) or worse in the size of the VALUES clause.

You can confirm that the large VALUES clause is the main contributing factor by 
reducing its size and seeing what that does to the running time.

CALCITE-604 and CALCITE-1147 might both improve this situation, so it's worth 
finding out whether they're in that Drill version. They might also make the 
situation worse, caching being somewhat of a blunt instrument.

See if you can reproduce this bug in just Calcite. (E.g. in the PlannerTest.)

> Query HANG - SELECT DISTINCT over date data
> -------------------------------------------
>
>                 Key: DRILL-4678
>                 URL: https://issues.apache.org/jira/browse/DRILL-4678
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.7.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>         Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>       (CAST('2002-03-04' AS DATE)),
>       (CAST('1966-09-04' AS DATE)),
>       (CAST('1993-08-18' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1959-10-23' AS DATE)),
>       (CAST('1992-01-14' AS DATE)),
>       (CAST('1994-07-24' AS DATE)),
>       (CAST('1979-11-25' AS DATE)),
>       (CAST('1945-01-14' AS DATE)),
>       (CAST('1982-07-25' AS DATE)),
>       (CAST('1966-09-06' AS DATE)),
>       (CAST('1989-05-01' AS DATE)),
>       (CAST('1996-03-08' AS DATE)),
>       (CAST('1998-08-19' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
>     (CAST('1962-07-03' AS DATE)),
>       (CAST('2011-08-17' AS DATE)),
>       (CAST('2011-05-16' AS DATE)),
>       (CAST('1946-05-08' AS DATE)),
>       (CAST('1994-02-13' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1958-02-06' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('1998-03-26' AS DATE)),
>       (CAST('1996-11-04' AS DATE)),
>       (CAST('1953-09-25' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('1980-07-05' AS DATE)),
>       (CAST('1982-06-15' AS DATE)),
>       (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in 
> STARTING state
> {noformat}
> Running Queries
> Time  User    Query   State   Foreman
> 05/16/2016 10:33:57   
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
> (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is 
> attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to