[
https://issues.apache.org/jira/browse/HIVE-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965421#comment-15965421
]
Colin Ma commented on HIVE-16311:
---------------------------------
[~mmccline], [~xuefuz], I just found the unnecessary code in
FastHiveDecimalImpl.fastDivide() which cause the poor performance.
BigDecimal.stripTrailingZeros() is slow and unnecessary, because
fastTrailingDecimalZeroCount() and doFastScaleDown() will do the same thing,
and faster than BigDecimal.stripTrailingZeros(). You can refer the patch for
the detail, here is the [Review board link|https://reviews.apache.org/r/58377]
for easy review.
The following the micro benchmark for the patch, every expression are
calculated 500000 times:
||expression||without patch(s)||with patch(s)||improvement||
|15 / 3|1.78|0.43|75.84%|
|0.001 / 810|0.56|0.33|41.07%|
|1 / 3|0.74|0.36|41.35%|
|10000000 / 10|2.4|0.6|75%|
|1234567890000000123456789 / 1234567891|1.21|0.66|45.45%|
|1234500000000000123450000000001234.567 / 123.45|1.73|0.94|45.66%|
|3.140 / 1.00|1.84|0.45|75.54%|
|31401234567 / 112.3|0.9|0.53|41.11%|
|12345612345678901234561234567890123456 / 987654321|1.7|0.96|43.53%|
|12345612345678901234561234567890123456 / 9876543210123456|1.63|1|38.65%|
|0.00123456 / 0.098765|0.68|0.39|42.65%|
|0.000000000088 / 1000000000000000|1.32|0.25|81.06%|
|0.000000000088 / 9876543210123456|0.28|0.18|35.71%|
The expressions like *3.140 / 1.00, 15 / 3* have many trailing zeros, so they
get much improvement from patch. For other expressions, they get precision from
*precision = bigDecimal.precision()* instead of *precision =
bigInteger.toString().length()*, and also get about 40% improvement.
The following is the benchmark with q06 of TPCx-BB:
The cluster includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R)
E5-2680, 1G network, with the 1T data scale and spark as executor engine.
|| ||without patch||with patch||improvement||
|disable vectorization|214s|178s|16.82%|
|enable vectorization(Parquet file format)|252s|140s|44.44%|
> Improve the performance for FastHiveDecimalImpl.fastDivide
> ----------------------------------------------------------
>
> Key: HIVE-16311
> URL: https://issues.apache.org/jira/browse/HIVE-16311
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 2.2.0
> Reporter: Colin Ma
> Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16311.001.patch, HIVE-16311.002.patch,
> HIVE-16311.003.patch, HIVE-16311.004.patch, HIVE-16311.005.patch,
> HIVE-16311.006.patch, HIVE-16311.withTrailingZero.patch
>
>
> FastHiveDecimalImpl.fastDivide is poor performance when evaluate the
> expression as 12345.67/123.45
> There are 2 points can be improved:
> 1. Don't always use HiveDecimal.MAX_SCALE as scale when do the
> BigDecimal.divide.
> 2. Get the precision for BigInteger in a fast way if possible.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)