[jira] [Commented] (PIG-2317) Ruby/Jruby UDFs

Jonathan Coveney (Commented) (JIRA) Tue, 20 Mar 2012 14:04:05 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233743#comment-13233743
 ]


Jonathan Coveney commented on PIG-2317:
---------------------------------------

OK! I just uploaded a new diff (which incorporates Daniel's changes). It may be 
possible to undo some of that, actually... I'll explain some of the big new 
changes. First, a todo list:
- Need to add more e2e tests
- Need to add more traditional tests
- Need to make the Javadocs more robust with @params and whatnot
- Need to add varargs support (this is the only feature that is missing, AFAIK)
- I have some TODO's littered about...need to clean those up

In general, there is a LOT more commenting, and I tried to be super explicit on 
the Ruby side of things.

I significantly cleaned up and simplified pigudf.rb, taking into account 
comments from Julien. I simplified the mechanisms at play as far as I could.

pigudf.rb is in src/main/jruby/

Now, in order to get access to the Pig library, all you have to do is "require 
'pig'", which imho is awesome: you just require pig, and you get everything! 
It's super clean. The unclean part of it is the way it works. If you do 
"require 'name.jar'", then JRuby looks for NameService.java in the base of the 
jar. If you do "require 'path/to/name.jar'", it'll look for 
path.to.NameService.java. Either way, this is the reason why I had to add 
src/PigService.java. IMHO the win is worth it, as it is super clean. In JRuby 
1.7.0 there is a proposal to use the jar manifest to deal with this, and it's 
something I've brought up with them and something that will happen. 1.7 should 
also remove the need for a hack described below.

I got rid of the BagIterator, as it didn't make much sense. In this 
implementation, it makes more sense just to iterate on the DataBag object in 
Ruby directly, as it hides the pain (this pattern is repeated in Schema).

HACK ALERT: for people who know ruby, generally if you include 'Enumerable', 
and implement each, you can do "obj.each" and it will give you an enumerator 
object. This is useful for chaining together functions that enumerate over the 
object and change it in some way. Either way, JRuby 1.6.7 has a method that 
provides exactly this functionality...but they forgot to give it public 
permissions (it's just static enumeratorize(Blahblahblah)). I worked hard to 
try and get around the need for this, but it does it so cleanly and doing it 
any other way is such a pain (I haven't found a good one), that I used 
reflection to get around the permissions. I felt ok doing this because the 
1.7.0 branch makes this explicitly public -- it was just an oversight.

Accumulator now uses outputSchema, as it always should have.

One (surprisingly long) addition is a Ruby interface for Schema objects! It 
protects the user from the Schema/FieldSchema divide, and makes it really easy 
to mix String schema declarations and a Schema object that is input. I will 
post more depth about this later, but I think my time would be better served 
fixing the javadocs and the tests atm.
                
> Ruby/Jruby UDFs
> ---------------
>
>                 Key: PIG-2317
>                 URL: https://issues.apache.org/jira/browse/PIG-2317
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jacob Perkins
>            Assignee: Jonathan Coveney
>            Priority: Minor
>         Attachments: PIG-2317-8.patch, PIG-2317-8_plus.patch, 
> PIG-2317-9.patch, PigUdf.rb, PigUdf.rb, jruby_scripting.patch, 
> jruby_scripting_2_real.patch, jruby_scripting_3.patch, 
> jruby_scripting_4.patch, jruby_scripting_5.patch, jruby_scripting_6.patch, 
> jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb
>
>
> It should be possible to write UDFs in Ruby. These UDFs will be registered in 
> the same way as python and javascript UDFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2317) Ruby/Jruby UDFs

Reply via email to