[ 
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837097#comment-13837097
 ] 

Eric Hanson commented on HIVE-5761:
-----------------------------------

We should just use the number of days since the epoch as the representation of 
DATE in a vector. This will allow you to re-use all the VectorExpressions to do 
<, >, <=, >=, =, and !=, like LongColEqualLongScalar, 
FilterLongColEqualLongScalar, and dozens of others, rather than implement new 
ones. If you play tricks and try to cache data inside the vector elements, you 
will have to re-implement all the comparison operations -- that is too much 
work.

But using an external cache to accelerate the operations like getWeek, 
getMonth, getYear, etc. is a good idea. You could implement a cache as a 
separate data structure that is a member variable of the VectorExpression class 
used to implement an operation like getYear. E.g. an array of about 8000 
elements could contain the results for all date function translations for + or 
- 11 years. You don't even need to hash, just use the day integer to compute 
the cache array entry number with a direct formula. For outliers outside the 
size of your cache array, you could just fall back on a slower path to do the 
full computation. You could rely on the fact that almost all date values in the 
user data will be between, say, today -18 years and today + 2 years. 





> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
>                 Key: HIVE-5761
>                 URL: https://issues.apache.org/jira/browse/HIVE-5761
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
>
> Add support to allow queries referencing DATE columns and expression results 
> to run efficiently in vectorized mode. This should re-use the code for the 
> the integer/timestamp types to the extent possible and beneficial. Include 
> unit tests and end-to-end tests. Consider re-using or extending existing 
> end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to