[
https://issues.apache.org/jira/browse/IMPALA-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670635#comment-16670635
]
Tim Armstrong commented on IMPALA-5946:
---------------------------------------
There's some precision lost when dividing. Repro:
{noformat}
with ss_items as
(select i_item_id item_id
,sum(ss_ext_sales_price) ss_item_rev
from store_sales
,item
,date_dim
where ss_item_sk = i_item_sk
and d_date in (select d_date
from date_dim
where d_week_seq in (select d_week_seq
from date_dim
where d_date = '2000-01-03'))
and ss_sold_date_sk = d_date_sk
group by i_item_id),
cs_items as
(select i_item_id item_id
,sum(cs_ext_sales_price) cs_item_rev
from catalog_sales
,item
,date_dim
where cs_item_sk = i_item_sk
and d_date in (select d_date
from date_dim
where d_week_seq in (select d_week_seq
from date_dim
where d_date = '2000-01-03'))
and cs_sold_date_sk = d_date_sk
group by i_item_id),
ws_items as
(select i_item_id item_id
,sum(ws_ext_sales_price) ws_item_rev
from web_sales
,item
,date_dim
where ws_item_sk = i_item_sk
and d_date in (select d_date
from date_dim
where d_week_seq in (select d_week_seq
from date_dim
where d_date = '2000-01-03'))
and ws_sold_date_sk = d_date_sk
group by i_item_id)
select ss_items.item_id
,ss_item_rev
,ws_item_rev
,ws_item_rev / ss_item_rev calc
from ss_items,cs_items,ws_items
where ss_items.item_id=cs_items.item_id
and ss_items.item_id=ws_items.item_id
and ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
order by item_id
,ss_item_rev
limit 100
Query submitted at: 2018-10-31 12:36:23 (Coordinator:
http://tarmstrong-box:25000)
Query progress can be monitored at:
http://tarmstrong-box:25000/query_plan?query_id=d342334a56ac6c57:e53813cb00000000
+------------------+-------------+-------------+----------+
| item_id | ss_item_rev | ws_item_rev | calc |
+------------------+-------------+-------------+----------+
| AAAAAAAACNGBAAAA | 1900.15 | 1829.52 | 0.962829 |
| AAAAAAAAIDOAAAAA | 6605.22 | 6338.25 | 0.959582 |
| AAAAAAAAJMFCAAAA | 3608.52 | 3305.82 | 0.916115 |
+------------------+-------------+-------------+----------+
{noformat}
0.962829 is a rounded value that drops some precision. If I divide the two
inputs in python I get 0.9628292503223429. In the context of the more complex
expression it gets a bit weird because we add some additional precision with
the subsequent add and multiply operations
> Calculations on DECIMAL columns in WITH tables lose precision
> -------------------------------------------------------------
>
> Key: IMPALA-5946
> URL: https://issues.apache.org/jira/browse/IMPALA-5946
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.10.0
> Reporter: Tim Wood
> Assignee: Tim Armstrong
> Priority: Major
> Labels: correctness, decimal
> Attachments: ttq-212.out, ttq-214.out
>
>
> See attached query (tpcds-q58.test) + outputs. ttq-212.out uses float
> constants in expressions, ttq-214.out uses integer constants. Both lose
> precision.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]