[
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768058#comment-15768058
]
Matt McCline commented on HIVE-15335:
-------------------------------------
Query benchmark on V1 showed very, very high cost in HiveDecimalWritable
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in
ORC decimal deserialization (BigInteger).
The cost of V1 decimal add turns out not to be add but the cost of
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger
bytes for HiveDecimalWritable.set. Everywhere code was doing a getHiveDecimal
to pass it around between components.
Making HiveDecimalWritable a fast, first class citizen was major part of this
change. That included making HiveDecimalWritable the object of choice to pass
around or operate on directly. E.g. Vectorized SUM aggregation eliminated
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.
One query benchmark on the new code showed 3X improvement and the add method
cost was in the noise. So storing decimals in 1 long instead of 3 (i.e. so
called fast path isn't the place to look. Microbenchmarks on add cost miss the
boat. The fast path is using HiveDecimalWritable.mutableAdd and the fast V2
serialization/deserialization methods including the HiveDecimal.create family /
HiveDecimalWritable.set family.
> Fast Decimal
> ------------
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch,
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch,
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch,
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch,
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch,
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch,
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal
> internally as a BigDecimal with a faster version that does not allocate extra
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and
> stores the result as a fast decimal instead of a slow byte array containing a
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)