[
https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated PIG-2421:
----------------------------
Attachment: PIG-newudf.patch
Attached is a first pass at this. This is not complete at all. In particular
it does not deal with how these changes would affect the scripting language
UDFs, which it needs to.
This is just to give an idea of how we might approach this. I have tried to
address the issues I noted in the description. In particular I have moved to
an annotation based scheme that allows users to have multiple instances of exec
and other functions, but also to use a single function for multiple purposes
(such as exec and initial). I have also tried to hide the internal
wrapping/unwrapping so details developers could more easily share code between
functions (e.g. initial and intermediate.
The idea is that the current EvalFunc class would remain as is, with these
methods being placed in a separate package. Any new functionality would only
be added to these new classes.
> EvalFuncs need redesigned
> -------------------------
>
> Key: PIG-2421
> URL: https://issues.apache.org/jira/browse/PIG-2421
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Affects Versions: 0.11
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: PIG-newudf.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator
> interfaces) have grown unwieldy. In particular, people have noted the
> following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type
> checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using
> getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code
> from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to
> know when to return a tuple (Initial, Intermediate) and when not to (exec,
> Final).
> # The separation of Initial, Intermediate, and Final into separate classes
> forces code duplication and makes it hard for UDFs in other languages to use
> those interfaces.
> # There is unused code in the current interface that occasionally causes
> confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue
> working essentially forever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira