[ https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966 ]
Matt McCline edited comment on HIVE-15335 at 12/15/16 1:08 AM: --------------------------------------------------------------- I have great difficulty accepting that DecimalColumnVector is now a public API. I haven't even begun to think of all the problems this will create. I made quite a number of changes to HiveDecimal and HiveDecimalWritable. Not just the internals but to the interfaces. For example, current HiveDecimalWritable is very slow because it internally represents decimals as BigInteger binary bytes. It exposes the binary bytes through getInternalStorage(). I zapped that immediately. The compatibility I designed for was serialization/deserialization of binary bits and text and decimal execution behavior -- not code compatibility. Binary bit compatibility ensures ORC will be able to read/write the same information. The TestHiveDecimal class verifies that the binary bit compatibility with SerializationUtils (ORC’s serialization), with BigInteger binary bit compatibility (LazyBinary, Avro, Parquet), and same behavior with OldHiveDecimal/OldHiveDecimalWritable (the original HiveDecimal/HiveDecimalWritable renamed). I needed to be able to make major code changes (the core fast decimal implementation class is 9,000 lines) to get good performance with ORC serialization/deserialization of decimals and with all other decimal operations (except division/remainder). Matching the semantics of Hive decimals and BigDecimal that execute quickly is quite challenging. I need to be able to take a hammer to the code in the future to get good performance. I've done some experimenting improving the performance of HiveChar/HiveVarchar and its writables. Very little of the original code will survive -- just like with fast decimals. was (Author: mmccline): I have great difficulty accepting that DecimalColumnVector is now a public API. Gunther will need to take that up with you. I made quite a number of changes to HiveDecimal and HiveDecimalWritable. Not just the internals but to the interfaces. For example, current HiveDecimalWritable is very slow because it internally represents decimals as BigInteger binary bytes. It exposes the binary bytes through getInternalStorage(). I zapped that immediately. The compatibility I designed for was serialization/deserialization of binary bits and text and decimal execution behavior -- not code compatibility. Binary bit compatibility ensures ORC will be able to read/write the same information. The TestHiveDecimal class verifies that the binary bit compatibility with SerializationUtils (ORC’s serialization), with BigInteger binary bit compatibility (LazyBinary, Avro, Parquet), and same behavior with OldHiveDecimal/OldHiveDecimalWritable (the original HiveDecimal/HiveDecimalWritable renamed). I needed to be able to make major code changes (the core fast decimal implementation class is 9,000 lines) to get good performance with ORC serialization/deserialization of decimals and with all other decimal operations (except division/remainder). Matching the semantics of Hive decimals and BigDecimal that execute quickly is quite challenging. I need to be able to take a hammer to the code in the future to get good performance. I've done some experimenting improving the performance of HiveChar/HiveVarchar and its writables. Very little of the original code will survive -- just like with fast decimals. > Fast Decimal > ------------ > > Key: HIVE-15335 > URL: https://issues.apache.org/jira/browse/HIVE-15335 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, > HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, > HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, > HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch > > > Replace HiveDecimal implementation that currently represents the decimal > internally as a BigDecimal with a faster version that does not allocate extra > objects > Replace HiveDecimalWritable implementation with a faster version that has new > mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and > stores the result as a fast decimal instead of a slow byte array containing a > serialized BigInteger. > Provide faster ways to serialize/deserialize decimals. -- This message was sent by Atlassian JIRA (v6.3.4#6332)