Hyoungjun Kim created TAJO-1444:
-----------------------------------

             Summary: Add warning document for integer division.
                 Key: TAJO-1444
                 URL: https://issues.apache.org/jira/browse/TAJO-1444
             Project: Tajo
          Issue Type: Task
            Reporter: Hyoungjun Kim
            Priority: Trivial


Tajo truncates the result of integer division, but other commonly used 
databases(e.g MySQL, Hive, Oracle) returns implicitly type casted result. So if 
user doesn't know about this issue, Tajo returns wrong result data which user 
expects. We should add warning message in the manual or wiki page with wrong 
use-case.

For example, see the following query:
{code:sql}
select case when 1/3 > 0.1 then 'matched' else 'not matched' end
{code}

Tajo and PostgreSQL return 'not matched' because the result of '1/3' is zero. 
That query should use explicit type casting like this:

{code:sql}
select case when 1::float/3::float > 0.1 then 'matched' else 'not matched' end
{code}

The above query can find easily. But if that expression is located in a complex 
query, can't find easily. I couldn't find the error of the following 
query(TPC-DS query21) before comparing with Hive result.

{code:sql}
select *
 from (select w_warehouse_name
            ,i_item_id
            ,sum(case when (cast(d_date as date) < cast ('1998-04-08' as date))
                      then inv_quantity_on_hand
                      else 0 end) as inv_before
            ,sum(case when (cast(d_date as date) >= cast ('1998-04-08' as date))
                      then inv_quantity_on_hand 
                      else 0 end) as inv_after
   from inventory
       ,warehouse
       ,item
       ,date_dim
   where i_current_price between 0.99 and 1.49
     and i_item_sk          = inv_item_sk
     and inv_warehouse_sk   = w_warehouse_sk
     and inv_date_sk    = d_date_sk
     and d_date between (cast ('1998-04-08' as date) - 30)
                    and (cast ('1998-04-08' as date) + 30)
   group by w_warehouse_name, i_item_id) x
 where (case when inv_before > 0
             then inv_after / inv_before
             else null
             end) between 2.0/3.0 and 3.0/2.0
 order by w_warehouse_name
         ,i_item_id
 limit 100;
{code}   

The intention of the above query is to filter that ratio is between 0.5 and 
1.5. But (inv_after / inv_before) expression returns integer type. For accurate 
result, The query should be modified as the following.

{code:sql}
 where (case when inv_before > 0
             then inv_after::float / inv_before::float
             else null
             end) between 2.0/3.0 and 3.0/2.0
{code}  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to