[
https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Victoria Markman closed DRILL-2431.
-----------------------------------
> Document behavior of floating point types
> -----------------------------------------
>
> Key: DRILL-2431
> URL: https://issues.apache.org/jira/browse/DRILL-2431
> Project: Apache Drill
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Kristine Hahn
>
> Joining on columns of float and double data type produces confusing
> result. Drill returns the same result as postgres. Part of me feels that we
> should not follow postgres blindly in this case, removing implicit cast
> between float and double would be better choice.
> At a minimum we should have a section in our documentation that discusses
> floating point types and this is a good example on how things can go wrong if
> user does not understand the behavior.
> Example of such a discussion in Postgres docs:
> http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
> t1.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> t2.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> {code}
> create table t1(c_float, c_double) as
> select
> case when columns[0] = '' then cast(null as float) else
> cast(columns[0] as float) end,
> case when columns[1] = '' then cast(null as double) else
> cast(columns[1] as double) end
> from `t1.csv`;
> create table t2(c_float, c_double) as
> select
> case when columns[0] = '' then cast(null as float) else
> cast(columns[0] as float) end,
> case when columns[1] = '' then cast(null as double) else
> cast(columns[1] as double) end
> from `t2.csv`;
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+
> | c_float | c_double |
> +------------+------------+
> | 997322.06 | 997322.0399 |
> | 982209.1 | 982209.1438 |
> | 997322.0 | 997322.0 |
> | 982209.0 | 982209.0 |
> | 963548.0 | 963548.0 |
> | 959310.0 | 959310.0 |
> +------------+------------+
> 6 rows selected (0.05 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+
> | c_float | c_double |
> +------------+------------+
> | 997322.06 | 997322.0399 |
> | 982209.1 | 982209.1438 |
> | 997322.0 | 997322.0 |
> | 982209.0 | 982209.0 |
> | 963548.0 | 963548.0 |
> | 959310.0 | 959310.0 |
> +------------+------------+
> 6 rows selected (0.044 seconds)
> {code}
> Implicit cast: looks incorrect, but in fact we can't expect this to work.
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
> +------------+------------+------------+------------+
> | c_float | c_double | c_float0 | c_double0 |
> +------------+------------+------------+------------+
> | 959310.0 | 959310.0 | 959310.0 | 959310.0 |
> | 963548.0 | 963548.0 | 963548.0 | 963548.0 |
> | 982209.0 | 982209.0 | 982209.0 | 982209.0 |
> | 997322.0 | 997322.0 | 997322.0 | 997322.0 |
> +------------+------------+------------+------------+
> 4 rows selected (0.127 seconds)
> {code}
> Explicit cast: same
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as
> double) = t2.c_double;
> +------------+------------+------------+------------+
> | c_float | c_double | c_float0 | c_double0 |
> +------------+------------+------------+------------+
> | 959310.0 | 959310.0 | 959310.0 | 959310.0 |
> | 963548.0 | 963548.0 | 963548.0 | 963548.0 |
> | 982209.0 | 982209.0 | 982209.0 | 982209.0 |
> | 997322.0 | 997322.0 | 997322.0 | 997322.0 |
> +------------+------------+------------+------------+
> 4 rows selected (0.136 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)