[ 
https://issues.apache.org/jira/browse/PIG-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251336#comment-13251336
 ] 

Jonathan Coveney commented on PIG-2643:
---------------------------------------

I don't know that I like including all of the methods for types supported by 
Pig...if anything, I think we should try and move towards a solution that 
requires less code like that. Why cruft up the code base with a bunch of 
wrappers?

In that vein, I really like your first proposal. I think it's definitely the 
direction we should go. As far as a syntax to allow people to call methods that 
require an object, how about:

{code}
a = load 'thing' as (a:chararray, b:chararray);
b = foreach a generate a:sqrt(b) as sqrt;
{code}

I'd love to use $, but I see the potential for namespace collision to be too 
great, not to mention ambiguity on the parser. It doesn't have to be : of 
course, but I don't think . will work. But perhaps I'm wrong? Either way, I 
think this is one of those cases where we should shoot for a really usable 
syntax. In both of these cases, we could use the InvokerGenerater and 
reflection to figure out all of the necessary code.

As far as the defintion of the UDF, I think that the mentioned approach is not 
a bad one. Another possibility:
{code}
c = load 'thing' as (x:bag{t:(v:chararray)});
DEFINE joiner NEW com.google.common.base.Joiner.on('-').skipNulls();
d = foreach c generate joiner:join(x);
{code}

This would introduce a new keyword which would allow us to more succinctly 
reference one object with various methods we want to use. The define would 
register this string in the namespace, and then when we see a :, first we see 
if what is to the left is a relation, then we see if it is in the object space. 
If it is, then we can build up the UDF.

I think we're heading in the right direction, whatever we choose.
                
> Use bytecode generation to make a performance replacement for InvokeForLong, 
> InvokeForString, etc
> -------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2643
>                 URL: https://issues.apache.org/jira/browse/PIG-2643
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>              Labels: codegen
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2643-0.patch
>
>
> This is basically to cut my teeth for much more ambitious code generation 
> down the line, but I think it could be performance and useful.
> the new syntax is:
> {code}a = load 'thing' as (x:chararray);
> define concat InvokerGenerator('java.lang.String','concat','String');
> define valueOf InvokerGenerator('java.lang.Integer','valueOf','String');
> define valueOfRadix 
> InvokerGenerator('java.lang.Integer','valueOf','String,int');
> b = foreach a generate x, valueOf(x) as vOf;
> c = foreach b generate x, vOf, valueOfRadix(x, 16) as vOfR;
> d = foreach c generate x, vOf, vOfR, concat(concat(x, (chararray)vOf), 
> (chararray)vOfR);
> dump d;
> {code}
> There are some differences between this version and Dmitriy's implementation:
> - it is no longer necessary to declare whether the method is static or not. 
> This is gleaned via reflection.
> - as per the above, it is no longer necessary to make the first argument be 
> the type of the object to invoke the method on. If it is not a static method, 
> then the type will implicitly be the type you need. So in the case of concat, 
> it would need to be passed a tuple of two inputs: one for the method to be 
> called against (as it is not static), and then the 'string' that was 
> specified. In the case of valueOf, because it IS static, then the 'String' is 
> the only value.
> - The arguments are type sensitive. Integer means the Object Integer, whereas 
> int (or long, or float, or boolean, etc) refer to the primitive. This is 
> necessary to properly reflect the arguments. Values passed in WILL, however, 
> be properly unboxed as necessary.
> - The return type will be reflected.
> This uses the ASM API to generate the bytecode, and then a custom classloader 
> to load it in. I will add caching of the generated code based on the input 
> strings, etc, but I wanted to get eyes and opinions on this. I also need to 
> benchmark, but it should be native speed (excluding a little startup time to 
> make the bytecode, but ASM is really fast).
> Another nice benefit is that this bypasses the need for the JDK, though it 
> adds a dependency on ASM (which is a super tiny dependency).
> Patch incoming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to