[ 
https://issues.apache.org/jira/browse/PIG-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045910#comment-14045910
 ] 

Abhishek Agarwal commented on PIG-2490:
---------------------------------------

Is anyone working on this? At InMobi, we are trying to do something similar by 
implementing a custom ChainedUDF, that can take multiple UDFs as arguments and 
execute them in chain. It will be very nice to have the native support from pig 
itself. 
+ [~sriksun]

> Add UDF function chaining syntax
> --------------------------------
>
>                 Key: PIG-2490
>                 URL: https://issues.apache.org/jira/browse/PIG-2490
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: David Ciemiewicz
>
> Nested function/UDF calls can make for very convoluted data transformations.
> For example, give the following sample data:
> {code}
> business1     9:00 AM - 4:00 PM
> {code}
> Transforming it with Pig UDFs might look like the following to normalize 
> hours to "9:00a-4:00p"
> {code}
> B = foreach A generate
>     REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(hours,' AM','a'), ' PM', 'p'), ' 
> *- *', '-')
>         as hours_normalized.
> {code}
> Yes, you could recast this as but it's still rather convoluted.
> {code}
> B = foreach A {
>     hours1 = REGEXREPLACE(hours,' AM\\b','a');
>     hours2 = REGEXREPLACE(hours1,' PM\\b','p');
>     hours3 = REGEXREPLACE(hours2,' *- *','-');
>     generate
>     hours3 as hours_normalized;
>     };
> {code}
> I suggest an "object-style" function chaining enhancement to the grammar a la 
> Java, JavaScript, etc.
> {code}
> B = foreach A generate
>     REGEXREPLACE(hours,' AM\\b','a').REGEXREPLACE(' 
> PM\\b','p').REGEXREPLACE(' *- *','-')
>         as hours_normalized;
> {code}
> This chaining notation makes it much clearer as to the sequence of actions 
> without the convoluted nesting.
> In the case of the "object-method" style dot (.) notation, the result of the 
> prior expression is just used as the first value in the tuple passed to the 
> function call.
> In other words, the following two expressions would be equivalent:
> {code}
> f(a,b)
> a.f(b)
> {code}
> As such, I don't think there are any requirements to modify existing UDFs.
> I think this is just a syntactic "sugar" enhancement that should be fairly 
> trivial to implement, yet would make coding complex data transformations with 
> Pig UDFs "cleaner".



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to