[
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966
]
Matt McCline commented on HIVE-15335:
-------------------------------------
I have great difficulty accepting that DecimalColumnVector is now a public API.
Gunther will need to take that up with you.
I made quite a number of changes to HiveDecimal and HiveDecimalWritable. Not
just the internals but to the interfaces. For example, current
HiveDecimalWritable is very slow because it internally represents decimals as
BigInteger binary bytes. It exposes the binary bytes through
getInternalStorage(). I zapped that immediately. The compatibility I
designed for was serialization/deserialization of binary bits and text and
decimal execution behavior -- not code compatibility. Binary bit compatibility
ensures ORC will be able to read/write the same information. The
TestHiveDecimal class verifies that the binary bit compatibility with
SerializationUtils (ORC’s serialization), with BigInteger binary bit
compatibility (LazyBinary, Avro, Parquet), and same behavior with
OldHiveDecimal/OldHiveDecimalWritable (the original
HiveDecimal/HiveDecimalWritable renamed). I needed to be able to make major
code changes (the core fast decimal implementation class is 9,000 lines) to get
good performance with ORC serialization/deserialization of decimals and with
all other decimal operations (except division/remainder). Matching the
semantics of Hive decimals and BigDecimal that execute quickly is quite
challenging.
I need to be able to take a hammer to the code in the future to get good
performance. I've done some experimenting improving the performance of
HiveChar/HiveVarchar and its writables. Very little of the original code will
survive -- just like with fast decimals.
> Fast Decimal
> ------------
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch,
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch,
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch,
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal
> internally as a BigDecimal with a faster version that does not allocate extra
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and
> stores the result as a fast decimal instead of a slow byte array containing a
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)