[
https://issues.apache.org/jira/browse/HIVE-24354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-24354:
--------------------------------
Description:
While writing HIVE-24245 I found that ColumnVector doesn't have any methods for
getting a value from the vector, like:
{code}
ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
ColumnVector.getHash(n) // get the murmur hash for the nth element
{code}
Because of this, I ended up writing different vectorized UDAFs for different
data types, and the only difference was a single line which was about obtaining
a value from the vector. In the current vector expressions I can see a pattern
where we copy the whole expression with an abstract logic and the loops (this
is something I was thinking about in the scope of HIVE-21465 already), but I
don't like that way. When I create an abstract vectorized udaf, and extend it
for certain data types, I'm already allowed to bring in the overhead of
function calls for every single value, but I don't think I violate basic
vectorization principles, as we have vectors, so e.g. the object inspection
overhead is already eliminated.
I propose some convenience methods like above, which can define a strict
contract about how to retrieve data from a ColumnVector, I mean the nth element
of the vector in particular. The first patch (this jira) should contain an
implementation for all ColumnVector subclasses.
was:
While writing HIVE-24245 I found that ColumnVector doesn't have any methods for
getting a value from the vector, like:
{code}
ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
ColumnVector.getHash(n) // get the murmur hash for the nth element
{code}
Because of this, I ended up writing different vectorized UDAFs for different
data types, and the only difference was a single line which was about obtaining
a value from the vector. In the current vector expressions I can see a pattern
where we copy the whole expression with an abstract logic and the loops (this
is something I was thinking about in the scope of HIVE-21465 already), but I
don't like that way. When I create an abstract vectorized udaf, and extend it
for certain data types, I'm already allowed to bring in the overhead of
function calls for every single value, but I don't think I violate basic
vectorization principles, as we have vectors, so e.g. the object inspection
overhead is already eliminated.
I propose some convenience methods like above, which can define a strict
contract about how to retrieve data from a ColumnVector, I mean the nth elment
of the vector in particular.
> ColumnVector should declare abstract convenience methods for getting values
> ---------------------------------------------------------------------------
>
> Key: HIVE-24354
> URL: https://issues.apache.org/jira/browse/HIVE-24354
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> While writing HIVE-24245 I found that ColumnVector doesn't have any methods
> for getting a value from the vector, like:
> {code}
> ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
> ColumnVector.getHash(n) // get the murmur hash for the nth element
> {code}
> Because of this, I ended up writing different vectorized UDAFs for different
> data types, and the only difference was a single line which was about
> obtaining a value from the vector. In the current vector expressions I can
> see a pattern where we copy the whole expression with an abstract logic and
> the loops (this is something I was thinking about in the scope of HIVE-21465
> already), but I don't like that way. When I create an abstract vectorized
> udaf, and extend it for certain data types, I'm already allowed to bring in
> the overhead of function calls for every single value, but I don't think I
> violate basic vectorization principles, as we have vectors, so e.g. the
> object inspection overhead is already eliminated.
> I propose some convenience methods like above, which can define a strict
> contract about how to retrieve data from a ColumnVector, I mean the nth
> element of the vector in particular. The first patch (this jira) should
> contain an implementation for all ColumnVector subclasses.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)