[
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874144#comment-15874144
]
Gopal V commented on HIVE-15987:
--------------------------------
-1 for Hive-2.x branch storage-api impl, we consider this for Hive-3.0 branch
since this breaks external interfaces to ORC and 3rd party vectorized udfs.
> Replace ColumnVector.isNull boolean[] impl. with BitSet
> -------------------------------------------------------
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
> Issue Type: Improvement
> Components: Vectorization
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per
> 1 boolean with a backing long array. Also logical operations between longs
> are much faster than ones with bytes as it uses less instructions per byte.
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow.
> Such as simple reads will require more instructions per row. So it should
> include benchmark tests to show its performance impact.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)