Ability to access DistributedCache from UDFs --------------------------------------------
Key: HIVE-1016 URL: https://issues.apache.org/jira/browse/HIVE-1016 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Carl Steinbach Assignee: Carl Steinbach There have been several requests on the mailing list for information about how to access the DistributedCache from UDFs, e.g.: http://www.mail-archive.com/hive-u...@hadoop.apache.org/msg01650.html http://www.mail-archive.com/hive-u...@hadoop.apache.org/msg01926.html While responses to these emails suggested several workarounds, the only correct way of accessing the distributed cache is via the static methods of Hadoop's DistributedCache class, and all of these methods require that the JobConf be passed in as a parameter. Hence, giving UDFs access to the distributed cache reduces to giving UDFs access to the JobConf. I propose the following changes to GenericUDF/UDAF/UDTF: * Add an exec_init(Configuration conf) method that is called during Operator initialization at runtime. * Change the name of the "initialize" method to "compile_init" to make it clear that this method is called at compile-time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.