[
https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171440#comment-13171440
]
Thejas M Nair commented on PIG-2421:
------------------------------------
There is one problem with the annotations based approach for outputschema, you
loose the benefit of having a function! Take the case of builtin.TOBAG udf, the
output schema is computed based on input type. To overcome this we can either
continue to support getOutputSchema or have annotation support for specifying
the equivalent function.
I like the Dmitriy's idea letting the udf function return an iterator. It is
equivalent of an accumulator interface, but for udf output. Support for that
can also be done as a 2nd step. The output can be treated as a bag if the
iterator is an iterator of tuples. In other cases, I think we would need to
force the user to use a flatten on the udf. Doing an implicit flatten in the
udf is likely to be confusing.
I wonder if we should first make a decision about supporting a new list type
that acts as a list of any type (unlike bag, which is always list of tuples).
That would have an impact on what we decide the semantics of udf returning
Iterator should be.
> EvalFuncs need redesigned
> -------------------------
>
> Key: PIG-2421
> URL: https://issues.apache.org/jira/browse/PIG-2421
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Affects Versions: 0.11
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: PIG-newudf.patch, examples.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator
> interfaces) have grown unwieldy. In particular, people have noted the
> following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type
> checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using
> getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code
> from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to
> know when to return a tuple (Initial, Intermediate) and when not to (exec,
> Final).
> # The separation of Initial, Intermediate, and Final into separate classes
> forces code duplication and makes it hard for UDFs in other languages to use
> those interfaces.
> # There is unused code in the current interface that occasionally causes
> confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue
> working essentially forever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira