[ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837097#comment-13837097 ]
Eric Hanson commented on HIVE-5761: ----------------------------------- We should just use the number of days since the epoch as the representation of DATE in a vector. This will allow you to re-use all the VectorExpressions to do <, >, <=, >=, =, and !=, like LongColEqualLongScalar, FilterLongColEqualLongScalar, and dozens of others, rather than implement new ones. If you play tricks and try to cache data inside the vector elements, you will have to re-implement all the comparison operations -- that is too much work. But using an external cache to accelerate the operations like getWeek, getMonth, getYear, etc. is a good idea. You could implement a cache as a separate data structure that is a member variable of the VectorExpression class used to implement an operation like getYear. E.g. an array of about 8000 elements could contain the results for all date function translations for + or - 11 years. You don't even need to hash, just use the day integer to compute the cache array entry number with a direct formula. For outliers outside the size of your cache array, you could just fall back on a slower path to do the full computation. You could rely on the fact that almost all date values in the user data will be between, say, today -18 years and today + 2 years. > Implement vectorized support for the DATE data type > --------------------------------------------------- > > Key: HIVE-5761 > URL: https://issues.apache.org/jira/browse/HIVE-5761 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > Assignee: Teddy Choi > > Add support to allow queries referencing DATE columns and expression results > to run efficiently in vectorized mode. This should re-use the code for the > the integer/timestamp types to the extent possible and beneficial. Include > unit tests and end-to-end tests. Consider re-using or extending existing > end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1#6144)