Hyoungjun Kim created TAJO-1444:
-----------------------------------
Summary: Add warning document for integer division.
Key: TAJO-1444
URL: https://issues.apache.org/jira/browse/TAJO-1444
Project: Tajo
Issue Type: Task
Reporter: Hyoungjun Kim
Priority: Trivial
Tajo truncates the result of integer division, but other commonly used
databases(e.g MySQL, Hive, Oracle) returns implicitly type casted result. So if
user doesn't know about this issue, Tajo returns wrong result data which user
expects. We should add warning message in the manual or wiki page with wrong
use-case.
For example, see the following query:
{code:sql}
select case when 1/3 > 0.1 then 'matched' else 'not matched' end
{code}
Tajo and PostgreSQL return 'not matched' because the result of '1/3' is zero.
That query should use explicit type casting like this:
{code:sql}
select case when 1::float/3::float > 0.1 then 'matched' else 'not matched' end
{code}
The above query can find easily. But if that expression is located in a complex
query, can't find easily. I couldn't find the error of the following
query(TPC-DS query21) before comparing with Hive result.
{code:sql}
select *
from (select w_warehouse_name
,i_item_id
,sum(case when (cast(d_date as date) < cast ('1998-04-08' as date))
then inv_quantity_on_hand
else 0 end) as inv_before
,sum(case when (cast(d_date as date) >= cast ('1998-04-08' as date))
then inv_quantity_on_hand
else 0 end) as inv_after
from inventory
,warehouse
,item
,date_dim
where i_current_price between 0.99 and 1.49
and i_item_sk = inv_item_sk
and inv_warehouse_sk = w_warehouse_sk
and inv_date_sk = d_date_sk
and d_date between (cast ('1998-04-08' as date) - 30)
and (cast ('1998-04-08' as date) + 30)
group by w_warehouse_name, i_item_id) x
where (case when inv_before > 0
then inv_after / inv_before
else null
end) between 2.0/3.0 and 3.0/2.0
order by w_warehouse_name
,i_item_id
limit 100;
{code}
The intention of the above query is to filter that ratio is between 0.5 and
1.5. But (inv_after / inv_before) expression returns integer type. For accurate
result, The query should be modified as the following.
{code:sql}
where (case when inv_before > 0
then inv_after::float / inv_before::float
else null
end) between 2.0/3.0 and 3.0/2.0
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)