[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966
 ] 

Matt McCline edited comment on HIVE-15335 at 12/15/16 1:08 AM:
---------------------------------------------------------------

I have great difficulty accepting that DecimalColumnVector is now a public API. 
 I haven't even begun to think of all the problems this will create.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not 
just the internals but to the interfaces.  For example, current 
HiveDecimalWritable is very slow because it internally represents decimals as 
BigInteger binary bytes. It exposes the binary bytes through 
getInternalStorage().   I zapped that immediately.  The compatibility I 
designed for was serialization/deserialization of binary bits and text and 
decimal execution behavior -- not code compatibility.  Binary bit compatibility 
ensures ORC will be able to read/write the same information.  The 
TestHiveDecimal class verifies that the binary bit compatibility with 
SerializationUtils (ORC’s serialization), with BigInteger binary bit 
compatibility (LazyBinary, Avro, Parquet), and same behavior with 
OldHiveDecimal/OldHiveDecimalWritable (the original 
HiveDecimal/HiveDecimalWritable renamed).  I needed to be able to make major 
code changes (the core fast decimal implementation class is 9,000 lines) to get 
good performance with ORC serialization/deserialization of decimals and with 
all other decimal operations (except division/remainder).  Matching the 
semantics of Hive decimals and BigDecimal that execute quickly is quite 
challenging.

I need to be able to take a hammer to the code in the future to get good 
performance.  I've done some experimenting improving the performance of 
HiveChar/HiveVarchar and its writables.  Very little of the original code will 
survive -- just like with fast decimals.


was (Author: mmccline):
I have great difficulty accepting that DecimalColumnVector is now a public API. 
 Gunther will need to take that up with you.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not 
just the internals but to the interfaces.  For example, current 
HiveDecimalWritable is very slow because it internally represents decimals as 
BigInteger binary bytes. It exposes the binary bytes through 
getInternalStorage().   I zapped that immediately.  The compatibility I 
designed for was serialization/deserialization of binary bits and text and 
decimal execution behavior -- not code compatibility.  Binary bit compatibility 
ensures ORC will be able to read/write the same information.  The 
TestHiveDecimal class verifies that the binary bit compatibility with 
SerializationUtils (ORC’s serialization), with BigInteger binary bit 
compatibility (LazyBinary, Avro, Parquet), and same behavior with 
OldHiveDecimal/OldHiveDecimalWritable (the original 
HiveDecimal/HiveDecimalWritable renamed).  I needed to be able to make major 
code changes (the core fast decimal implementation class is 9,000 lines) to get 
good performance with ORC serialization/deserialization of decimals and with 
all other decimal operations (except division/remainder).  Matching the 
semantics of Hive decimals and BigDecimal that execute quickly is quite 
challenging.

I need to be able to take a hammer to the code in the future to get good 
performance.  I've done some experimenting improving the performance of 
HiveChar/HiveVarchar and its writables.  Very little of the original code will 
survive -- just like with fast decimals.

> Fast Decimal
> ------------
>
>                 Key: HIVE-15335
>                 URL: https://issues.apache.org/jira/browse/HIVE-15335
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to