Hi All,

I've been working with UDFs in hive a lot lately, usually to implement
some manner of small lookup which isn't worth the overhead of a join,
or which for some other reason is preferable as a function as a join.
This gets me into situations where I end up wanting one UDF to have
multiple return types - for example something like a geo IP look-up
would return an integer for an area code look-up or a string for a
country name look-up.  It seems the two ways to handle this are to
either write a different UDF for each return type, or potentially each
look-up, or to always return a String and use the hive built in cast
function "cast(expr as <type>)" on the return value.  So far I've been
favoring the second as the first seems to lead to a proliferation of
nearly identical classes, but I'm wondering if someone with more
experience in this might have a suggestion as why one might be better
than the other, or indeed if there is a third solution that I have
overlooked.

Thanks,

--Mark Tozzi

Reply via email to