[ 
https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kristine Hahn resolved DRILL-2431.
----------------------------------
    Resolution: Fixed

Done, added the following section to the docs:
{quote}
**Guidelines for Using Float and Double**
The data types float and double yield approximate results. These are 
variable-precision numeric types. Drill does not cast/convert all values 
precisely to the internal format, but instead stores approximations. Slight 
differences can occur in the value stored and retrieved. The following 
guidelines are recommended:

* For conversions involving monetary calculations, for example, that require 
precise results use the decimal type instead of float or double.
* For complex calculations or mission-critical applications, especially those 
involving infinity and underflow situations, carefully consider the limitations 
of type casting that involves float or double.
* Equality comparisons between floating-point values can produce unexpected 
results.

Values of float and double that exceed or fall short of the specified range 
cause an error. Rounding can occur if the precision of an input number is too 
high. Numbers approaching zero that Drill cannot distinguish from zero cause an 
underflow error.
{quote}

> Document behavior of floating point types
> -----------------------------------------
>
>                 Key: DRILL-2431
>                 URL: https://issues.apache.org/jira/browse/DRILL-2431
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Kristine Hahn
>
>    Joining on columns of float and double data type produces confusing 
> result. Drill returns the same result as postgres. Part of me feels that we 
> should not follow postgres blindly in this case, removing implicit cast 
> between float and double would be better choice.
>    At a minimum we should have a section in our documentation that discusses 
> floating point types and this is a good example on how things can go wrong if 
> user does not understand the behavior.
> Example of such a discussion in Postgres docs: 
> http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
> t1.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> t2.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> {code}
> create table t1(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else 
> cast(columns[0] as float) end,
>         case when columns[1] = '' then cast(null as double) else 
> cast(columns[1] as double) end
> from `t1.csv`;
> create table t2(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else 
> cast(columns[0] as float) end,
>         case when columns[1] = '' then cast(null as double) else 
> cast(columns[1] as double) end
> from `t2.csv`;
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.05 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.044 seconds)
> {code}
> Implicit cast: looks incorrect, but in fact we can't expect this to work.
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.127 seconds)
> {code}
> Explicit cast: same
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as 
> double) = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.136 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to