Ådne Brunborg created HIVE-9988:
-----------------------------------
Summary: Evaluating UDF before query is run
Key: HIVE-9988
URL: https://issues.apache.org/jira/browse/HIVE-9988
Project: Hive
Issue Type: Improvement
Reporter: Ådne Brunborg
When using UDFs on partition column in Hive, all partitions are scanned before
the UDF is resolved.
If the UDF could be evaluated before query is run, this would greatly improve
performance in cases like this.
Example - the table has a partition by datestamp (bigint):
The following where clause touches upon all 82 partitions:
{{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as bigint)}}
{{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process :
82}}
…whereas the following only touches the one partition:
{{WHERE datestamp=20150316}}
{{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process :
1}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)