[
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944685#comment-15944685
]
Colin Ma edited comment on HIVE-16311 at 3/28/17 7:33 AM:
----------------------------------------------------------
The initial patch is uploaded.
Do the simple test, 12345.67/123.45 with FastHiveDecimalImpl.fastDivide 500000
times, the result shows 1s(without patch) vs 0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide
expression:
{code}
sum( case when (d_year = 2001) THEN
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R)
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization(Parquet file format)|252s|125s|50.4%|
was (Author: colinma):
The initial patch is uploaded.
Do the simple test, 12345.67/123.45 with FastHiveDecimalImpl.fastDivide 500000
times, the result shows 1s(without patch) vs 0.1s(with patch).
Also test the patch with q06 of TPCx-BB which has the following divide
expression:
{code}
sum( case when (d_year = 2001) THEN
(((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2)
ELSE 0 END) first_year_total
{code}
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R)
E5-2680, 1G network.
With the 1T data scale and spark as executor engine, the following is the
result:
|| ||without patch||with patch||improvement||
|disable vectorization|214s|164s|23.36%|
|enable vectorization|252s|125s|50.4%|
> Improve the performance for FastHiveDecimalImpl.fastDivide
> ----------------------------------------------------------
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 2.2.0
> Reporter: Colin Ma
> Assignee: Colin Ma
> Fix For: 2.2.0
>
> Attachments: HIVE-16311.001.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)