[ https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233743#comment-13233743 ]
Jonathan Coveney commented on PIG-2317: --------------------------------------- OK! I just uploaded a new diff (which incorporates Daniel's changes). It may be possible to undo some of that, actually... I'll explain some of the big new changes. First, a todo list: - Need to add more e2e tests - Need to add more traditional tests - Need to make the Javadocs more robust with @params and whatnot - Need to add varargs support (this is the only feature that is missing, AFAIK) - I have some TODO's littered about...need to clean those up In general, there is a LOT more commenting, and I tried to be super explicit on the Ruby side of things. I significantly cleaned up and simplified pigudf.rb, taking into account comments from Julien. I simplified the mechanisms at play as far as I could. pigudf.rb is in src/main/jruby/ Now, in order to get access to the Pig library, all you have to do is "require 'pig'", which imho is awesome: you just require pig, and you get everything! It's super clean. The unclean part of it is the way it works. If you do "require 'name.jar'", then JRuby looks for NameService.java in the base of the jar. If you do "require 'path/to/name.jar'", it'll look for path.to.NameService.java. Either way, this is the reason why I had to add src/PigService.java. IMHO the win is worth it, as it is super clean. In JRuby 1.7.0 there is a proposal to use the jar manifest to deal with this, and it's something I've brought up with them and something that will happen. 1.7 should also remove the need for a hack described below. I got rid of the BagIterator, as it didn't make much sense. In this implementation, it makes more sense just to iterate on the DataBag object in Ruby directly, as it hides the pain (this pattern is repeated in Schema). HACK ALERT: for people who know ruby, generally if you include 'Enumerable', and implement each, you can do "obj.each" and it will give you an enumerator object. This is useful for chaining together functions that enumerate over the object and change it in some way. Either way, JRuby 1.6.7 has a method that provides exactly this functionality...but they forgot to give it public permissions (it's just static enumeratorize(Blahblahblah)). I worked hard to try and get around the need for this, but it does it so cleanly and doing it any other way is such a pain (I haven't found a good one), that I used reflection to get around the permissions. I felt ok doing this because the 1.7.0 branch makes this explicitly public -- it was just an oversight. Accumulator now uses outputSchema, as it always should have. One (surprisingly long) addition is a Ruby interface for Schema objects! It protects the user from the Schema/FieldSchema divide, and makes it really easy to mix String schema declarations and a Schema object that is input. I will post more depth about this later, but I think my time would be better served fixing the javadocs and the tests atm. > Ruby/Jruby UDFs > --------------- > > Key: PIG-2317 > URL: https://issues.apache.org/jira/browse/PIG-2317 > Project: Pig > Issue Type: New Feature > Reporter: Jacob Perkins > Assignee: Jonathan Coveney > Priority: Minor > Attachments: PIG-2317-8.patch, PIG-2317-8_plus.patch, > PIG-2317-9.patch, PigUdf.rb, PigUdf.rb, jruby_scripting.patch, > jruby_scripting_2_real.patch, jruby_scripting_3.patch, > jruby_scripting_4.patch, jruby_scripting_5.patch, jruby_scripting_6.patch, > jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb > > > It should be possible to write UDFs in Ruby. These UDFs will be registered in > the same way as python and javascript UDFs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira