[jira] [Commented] (DRILL-3638) Incorrect results LEAD()

Deneche A. Hakim (JIRA) Thu, 13 Aug 2015 09:14:33 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695453#comment-14695453
 ]


Deneche A. Hakim commented on DRILL-3638:
-----------------------------------------

The problem here is that you are trying to apply a filter on a FLOAT column. 
Because Drill uses floating point arithmetic the results are approximate and 
indeterministic.
For example the following *simple* query will return different results than 
Postgres:
{noformat}
select col2 from `3638.parquet` where col2 in (127.0 , 1.0 , 99.9 , 255.99 , 
63.99 , 127.9 , 128.978);
+--------+
|  col2  |
+--------+
| 127.0  |
| 1.0    |
+--------+
{noformat} 

Casting col2 to CHAR will help here (but not in the general case):
{noformat}
select col2 , lead(col2) over(partition by col7 order by col0) lead_col2 from 
`3638.parquet` where cast(col2 as char(10)) in ('127.0' , '1.0' , '99.9' , 
'255.99' , '63.99' , '127.9' , '128.978');
+----------+------------+
|   col2   | lead_col2  |
+----------+------------+
| 127.0    | 63.99      |
| 63.99    | 1.0        |
| 1.0      | 128.978    |
| 128.978  | null       |
| 99.9     | 255.99     |
| 255.99   | 127.9      |
| 127.9    | null       |
+----------+------------+
{noformat}

as you can see, the results now match the ones returned by Postgres

> Incorrect results LEAD(<float-type-column>) 
> --------------------------------------------
>
>                 Key: DRILL-3638
>                 URL: https://issues.apache.org/jira/browse/DRILL-3638
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: private-branch 
> https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>         Attachments: 0_0_0.parquet
>
>
> Query returns different results when LEAD function is given a column of type 
> float, compared to the results returned by Postgres for same input data.
> results (2 rows) returned by Drill (input is from Parquet file)
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select col2 , lead(col2) over(partition by col7 
> order by col0) lead_col2 from FEWRWSPQQ_101 where col2 in (127.0 , 1.0 , 99.9 
> , 255.99 , 63.99 , 127.9 , 128.978);
> +--------+------------+
> |  col2  | lead_col2  |
> +--------+------------+
> | 127.0  | 1.0        |
> | 1.0    | null       |
> +--------+------------+
> 2 rows selected (0.347 seconds)
> {code}
> results (7 rows) returned by Postgres for same input data (csv)
> {code}
> postgres=# select col2 , lead(col2) over(partition by col7 order by col0) 
> lead_col2 from FEWRWSPQQ_101 where col2 in (127.0 , 1.0 , 99.9 , 255.99 , 
> 63.99 , 127.9 , 128.978);
>   col2   | lead_col2 
> ---------+-----------
>      127 |     63.99
>    63.99 |         1
>        1 |   128.978
>  128.978 |          
>     99.9 |    255.99
>   255.99 |     127.9
>    127.9 |          
> (7 rows)
> {code}
> CTAS used to create the parquet file from CSV input 
> {code}
> CREATE TABLE FEWRWSPQQ_101 AS (SELECT cast(columns[0] as INT) col0, 
> cast(columns[1] as BIGINT) col1, cast(columns[2] as FLOAT) col2, 
> cast(columns[3] as DOUBLE) col3, cast(columns[4] as TIME) col4, 
> cast(columns[5] as TIMESTAMP) col5, cast(columns[6] as DATE) col6, 
> cast(columns[7] as BOOLEAN) col7, cast(columns[8] as char(2)) col8, 
> cast(columns[9] as varchar(52)) col9 from `all_types_few.csv`);
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 22                         |
> +-----------+----------------------------+
> 1 row selected (0.643 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3638) Incorrect results LEAD()

Reply via email to