[
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-3294:
----------------------------
Attachment: PIG-3294-before-refactory.patch
PIG-3294-1.patch
To use it, define HiveUDF/HiveUDTF/HiveUDAF in Pig:
define sin HiveUDF('sin'); -- alias in FunctionRegistry
define sin HiveUDF('org.apache.hadoop.hive.ql.udf.UDFSin'); -- full class name
define explode HiveUDTF('explode'); -- UDTF maps to Pig UDF returns bag
define avg HiveUDAF('avg'); -- UDAF maps to Pig Algebraic UDF
Some Hive UDF require constant parameters. Since Hive use ObjectInspector to
communicate schema to UDF, and ObjectInspector is richer than Schema in that
ObjectInspector can express a field is a constant or not. To support this
function, HiveUDF take an optional constant tuple. null item in the tuple means
it is not a constant:
define in_file HiveUDF('in_file', '(null, "names.txt")');
The patch contain the following changes:
1. Allow UDF produce a last record in close. This is used in HiveUDTF to
process all the records as input, and produce the output in close().
2. Add input schema to Initial, Intermed, Final to Algebraic. The input schema
is the original input schema of the UDF. The actual input schema is the
internal knowledge of the Algebraic and Pig does not know.
3. Several minor fix in combiner
* tez combiner conf does not have UDFContext
* does not set parentPlan for combiner plan operators
* resultType of FINAL is not set properly
4. Refactory OrcUtils -> HiveUtils (also include patch before refactory to ease
review)
> Allow Pig use Hive UDFs
> -----------------------
>
> Key: PIG-3294
> URL: https://issues.apache.org/jira/browse/PIG-3294
> Project: Pig
> Issue Type: New Feature
> Reporter: Daniel Dai
> Labels: gsoc2013, java
> Attachments: PIG-3294-1.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap
> Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information
> about the program can be found at
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)