Ryan Clough created DRILL-4447:
----------------------------------

             Summary: Drill seems to ignore TO_DATE(timestamp) when used inside 
DISTINCT() and GROUP BY
                 Key: DRILL-4447
                 URL: https://issues.apache.org/jira/browse/DRILL-4447
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.5.0
         Environment: Centos 6.2/Distributed/CDH5.4.9
            Reporter: Ryan Clough


The issue comes from a larger query, but I've managed to narrow it down to what 
is a minimally reproducible issue.

Given a list of timestamps (will attach files) associated with 3 days, We want 
to select the distinct dates (total: 3 days) from this list. To do this, I 
decided to use the TO_DATE function, which does exactly what I want it.

Note, there are 47 distinct timestamps in the data set.

{code:sql}
jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
. . . . . . . > FROM timestamps;
+-------------+
|  data_date  |
+-------------+
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-23  |
+-------------+
47 rows selected (11.057 seconds)
{code}

As you can see, DRILL has ignored the TO_DATE function when checking for 
distinct records (note that the 47 rows matches the 47 rows of distinct 
timestamps).

My testing has also shown that this affect GROUP BY. I wouldn't be surprised if 
it manifested its self elsewhere.

I tried to get around the problem by converting the dates to a string using 
TO_CHAR: surely drill will use the resulting strings to do the DISTINCT 
comparison?

{code:sql}
drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on 
lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
{code}

As far as I can tell from the docs, you SHOULD be able to convert a date to a 
string with TO_CHAR(). I'm not sure what the underlying issue is here, but I 
thought it good to report the issue.

Please let me know if you need any further info, query plans, etc, but it 
should be reproducable with the timestamps data I'll attach in a minute




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to