[
https://issues.apache.org/jira/browse/HIVE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248640#comment-17248640
]
Stamatis Zampetakis commented on HIVE-24528:
--------------------------------------------
For those interested here is what happens in other DBMS:
+Postgres+
{noformat}
postgres=# select * from t_str where str_col=1208925742523269479013976;
ERROR: operator does not exist: character varying = numeric
LINE 1: select * from t_str where str_col=1208925742523269479013976;
^
HINT: No operator matches the given name and argument types. You might need to
add explicit type casts.
postgres=# select * from t_str where CAST(str_col as double
precision)=1208925742523269479013976;
str_col
---------------------------
1208925742523269458163819
{noformat}
+SQLServer (mssql-2017)+
{noformat}
StmtText
-----------------------------------------------------
SELECT * FROM t3 where a=1208925742523269479013976;
(1 rows affected)
StmtText
----------------------------------------------------------------------------------------------------------------------
|--Table Scan(OBJECT:([master].[dbo].[t3]),
WHERE:(CONVERT_IMPLICIT(numeric(25,0),[master].[dbo].[t3].[a],0)=[@1]))
{noformat}
+Oracle+
{noformat}
SQL> explain plan for select * from dectbl inner join strtbl on deccol=strcol;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4134090056
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 40 | 4 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 40 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DECTBL | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| STRTBL | 1 | 27 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DECCOL"=TO_NUMBER("STRCOL"))
{noformat}
+MySQL+
{noformat}
mysql> explain analyze select * from dectbl inner join strtbl on deccol=strcol\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cast(dectbl.deccol as double) = cast(strtbl.strcol
as double)) (cost=0.70 rows=1) (actual time=0.209..0.233 rows=1 loops=1)
-> Table scan on strtbl (cost=0.35 rows=1) (actual time=0.023..0.034
rows=1 loops=1)
-> Hash
-> Table scan on dectbl (cost=0.35 rows=1) (actual time=0.047..0.063
rows=1 loops=1)
1 row in set (0.00 sec)
{noformat}
> Wrong implicit type conversion when comparing decimals and strings
> ------------------------------------------------------------------
>
> Key: HIVE-24528
> URL: https://issues.apache.org/jira/browse/HIVE-24528
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 2.3.0
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> In many cases when comparing decimals and strings (literals/columns) the
> comparison is done using doubles which can create some quite unexpected
> results in the answers of queries.
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');
> select * from t_str where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same
> with the one present in the string column of the table. The problem is that
> both types are converted to doubles and due to loss of precision the values
> are deemed equal.
> The same happens during the join of a decimal with a string type.
> {code:sql}
> create table t_dec (dec_col decimal(25,0));
> insert into t_dec values (1208925742523269479013976);
> select * from t_dec inner join t_str on dec_col=str_col;
> {code}
> The join result contains one row although the values are not equal.
> Implicit type conversions are working differently in every DBMS and for some
> of them (e.g., mysql) the above behavior is normal or not allowed at all
> (e.g. Postgres).
> In the past, Hive used to compare decimal with string columns by converting
> to decimals but this behavior changed in 2.3.0 (with HIVE-13380). It seems
> that this behavior change was not intentional since following jiras (e.g.,
> HIVE-18434) imply that comparison of decimals with strings should be done
> using decimals. Since decimal is an exact numeric it appears a more
> appropriate type for comparing decimals and strings.
> The goal of this issue is to change the implicit conversion of decimals with
> strings to doubles and use decimals instead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)