[ 
https://issues.apache.org/jira/browse/HIVE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248640#comment-17248640
 ] 

Stamatis Zampetakis commented on HIVE-24528:
--------------------------------------------

For those interested here is what happens in other DBMS:
+Postgres+
{noformat}
postgres=# select * from t_str where str_col=1208925742523269479013976;
ERROR:  operator does not exist: character varying = numeric
LINE 1: select * from t_str where str_col=1208925742523269479013976;
                                ^
HINT:  No operator matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# select * from t_str where CAST(str_col as double 
precision)=1208925742523269479013976;
             str_col             
---------------------------
 1208925742523269458163819
{noformat}

+SQLServer (mssql-2017)+
{noformat}
StmtText                                             
-----------------------------------------------------
SELECT * FROM t3 where a=1208925742523269479013976;  

(1 rows affected)
StmtText                                                                        
                                      
----------------------------------------------------------------------------------------------------------------------
  |--Table Scan(OBJECT:([master].[dbo].[t3]), 
WHERE:(CONVERT_IMPLICIT(numeric(25,0),[master].[dbo].[t3].[a],0)=[@1]))
{noformat}

+Oracle+
{noformat}
SQL> explain plan for select * from dectbl inner join strtbl on deccol=strcol;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4134090056
-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |    40 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN         |        |     1 |    40 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DECTBL |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| STRTBL |     1 |    27 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("DECCOL"=TO_NUMBER("STRCOL"))
{noformat}

+MySQL+
{noformat}
mysql> explain analyze select * from dectbl inner join strtbl on deccol=strcol\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cast(dectbl.deccol as double) = cast(strtbl.strcol 
as double))  (cost=0.70 rows=1) (actual time=0.209..0.233 rows=1 loops=1)
    -> Table scan on strtbl  (cost=0.35 rows=1) (actual time=0.023..0.034 
rows=1 loops=1)
    -> Hash
        -> Table scan on dectbl  (cost=0.35 rows=1) (actual time=0.047..0.063 
rows=1 loops=1)
1 row in set (0.00 sec)
{noformat}


> Wrong implicit type conversion when comparing decimals and strings
> ------------------------------------------------------------------
>
>                 Key: HIVE-24528
>                 URL: https://issues.apache.org/jira/browse/HIVE-24528
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 2.3.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> In many cases when comparing decimals and strings (literals/columns) the 
> comparison is done using doubles which can create some quite unexpected 
> results in the answers of queries.
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');
> select * from t_str where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> The same happens during the join of a decimal with a string type.
> {code:sql}
> create table t_dec (dec_col decimal(25,0));
> insert into t_dec values (1208925742523269479013976);
> select * from t_dec inner join t_str on dec_col=str_col;
> {code}
> The join result contains one row although the values are not equal.
> Implicit type conversions are working differently in every DBMS and for some 
> of them (e.g., mysql) the above behavior is normal or not allowed at all 
> (e.g. Postgres). 
> In the past, Hive used to compare decimal with string columns by converting 
> to decimals but this behavior changed in 2.3.0 (with HIVE-13380). It seems 
> that this behavior change was not intentional since following jiras (e.g., 
> HIVE-18434) imply that comparison of decimals with strings should be done 
> using decimals. Since decimal is an exact numeric it appears a more 
> appropriate type for comparing decimals and strings.
> The goal of this issue is to change the implicit conversion of decimals with 
> strings to doubles and use decimals instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to