[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768058#comment-15768058
 ] 

Matt McCline edited comment on HIVE-15335 at 12/21/16 8:22 PM:
---------------------------------------------------------------

Query benchmark on V1 showed very, very high cost in HiveDecimalWritable 
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in 
ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of 
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger 
bytes for HiveDecimalWritable.set.  Everywhere code was doing a getHiveDecimal 
to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this 
change.  That included making HiveDecimalWritable the object of choice to pass 
around or operate on directly.  E.g. Vectorized SUM aggregation eliminated 
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.

One query benchmark on the new code showed 3X improvement and the add method 
cost was in the noise.  So storing decimals in 1 long instead of 3 (i.e. so 
called fast path) isn't the place to look.  Microbenchmarks on add cost miss 
the boat.  The fast path is using HiveDecimalWritable.mutableAdd and the fast 
V2 serialization/deserialization methods including the HiveDecimal.create 
family / HiveDecimalWritable.set family.  Another way of thinking about the 
fast path is not using BigInteger / BigDecimal.


was (Author: mmccline):
Query benchmark on V1 showed very, very high cost in HiveDecimalWritable 
(serialization/deserialization, creation of HiveDecimal for getHiveDecimal), in 
ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of 
HiveDecimalWritable.getDecimal() and then serializing in back into BigInteger 
bytes for HiveDecimalWritable.set.  Everywhere code was doing a getHiveDecimal 
to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this 
change.  That included making HiveDecimalWritable the object of choice to pass 
around or operate on directly.  E.g. Vectorized SUM aggregation eliminated 
almost call calls HiveDecimalWritable.getHiveDecimal() for its summing.

One query benchmark on the new code showed 3X improvement and the add method 
cost was in the noise.  So storing decimals in 1 long instead of 3 (i.e. so 
called fast path isn't the place to look.  Microbenchmarks on add cost miss the 
boat.  The fast path is using HiveDecimalWritable.mutableAdd and the fast V2 
serialization/deserialization methods including the HiveDecimal.create family / 
HiveDecimalWritable.set family.

> Fast Decimal
> ------------
>
>                 Key: HIVE-15335
>                 URL: https://issues.apache.org/jira/browse/HIVE-15335
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, 
> HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to