[
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786709#comment-13786709
]
Aniket Mokashi commented on PIG-3082:
-------------------------------------
But, we should document this as incompatible change so that there are no
surprises?
> outputSchema of a UDF allows two usages when describing a Tuple schema
> ----------------------------------------------------------------------
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
> Issue Type: Bug
> Reporter: Julien Le Dem
> Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and
> it will be understood as a tuple schema even though there is no type (which
> is in Field class) to specify that. This is particularly deceitful when the
> output schema is derived from the input schema and the outputted Tuple
> sometimes contain only one field. In such cases Pig understands the output
> schema as a tuple only if there is more than one field. And sometimes it
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain
> throw an exception when the output schema contains more than one Field.
--
This message was sent by Atlassian JIRA
(v6.1#6144)