[jira] [Commented] (PIG-2361) Update builtin UDFs to use @OutputSchema

Jonathan Coveney (Commented) (JIRA) Mon, 14 Nov 2011 22:46:18 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150262#comment-13150262
 ]


Jonathan Coveney commented on PIG-2361:
---------------------------------------

Some food for thought that came up while switching around some of the builtin 
functions (and something it might be nice to have pig standardize on). Take a 
look at DoubleAbs, for example. It's an EvalFunc<Double>, so it really 
shouldn't need an outputSchema declaration, though this means that you have to 
explicitly name the output or else it is just an unnamed double. Currently, the 
outputSchema imeplementation in some of the builtin functions (but not 
all...this is something we should probably standardize?) will name it in such a 
way that various pieces won't collide. For example:

{code}
a = load 'data' as (thing:double); b = foreach a generate DoubleAbs(thing), 
DoubleAbs(thing); describe b;
{code}
will yield
{code}
b: {org.apache.pig.builtin.doubleabs_thing_22: 
double,org.apache.pig.builtin.doubleabs_thing_23: double}
{code}

Now, nobody is every going to actually use that schema, BUT, it is unique. It 
might be nice to think of a clean way to allow schema writers to get unique 
schemas? I mean, in the above case, the ideal to me would be that, given it's 
an EvalFunc<Double>, you get something reasonable like:

{code}
b: {x1: double, x2: double}
{code}

It would be super awesome if functions could do this easily. Then, for cases 
where you do annotate, you could allow

@OutputSchema("double") and it would give you a decent default.

Perhaps this deserves it's own JIRA if it is plausible. For the time being 
though, is it better to have the cleanliness of the default, or the 
"flexibility" of noncoliding names?
                
> Update builtin UDFs to use @OutputSchema
> ----------------------------------------
>
>                 Key: PIG-2361
>                 URL: https://issues.apache.org/jira/browse/PIG-2361
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>
> We can reduce the amount of code in our codebase and simultaneously provide 
> examples of how to use the feature introduced in PIG-2151 by replacing 
> implementations of outputSchema in built-in UDFs by @OutputSchema annotations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2361) Update builtin UDFs to use @OutputSchema

Reply via email to