[ 
https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479567#comment-13479567
 ] 

Cheolsoo Park commented on PIG-2927:
------------------------------------

Although I am no Ruby expert, I think that Jonathan's patch works well. Here is 
my test.

1) installed a non-trivial rubygem library (rubygem-json) on the client only 
and confirmed that it is not installed on any datanode on the cluster.
{code}
/usr/lib/ruby/gems/1.8/gems/json-1.4.6/
{code}
2) wrote a ruby udf that parses json string:
{code}
require 'rubygems'
require 'pigudf'
require 'json'

class Myudfs < PigUdf
   outputSchema "result:chararray"
   def parseJson input
      result = JSON.parse(input)
   end
end
{code}
3) wrote a short pig script that loads a jsonstring and calls my ruby udf:
{code}
register 'test.rb' using jruby as myfuncs;
a = load 'json.txt' using PigStorage() as (i:chararray);
b = foreach a generate myfuncs.parseJson(i);
dump b;
{code}
4) got the expected result as follows:
{code:title=input}
{"id":1,"nested":{"value1":"first1","next":{"complex_record":{"id":2,"nested":{"value1":"second1","next":null,"value2":"second2"}}},"value2":"first2"}}
{code}
{code:title=result}
([id#1,nested#{value1=first1, value2=first2, next={complex_record={id=2, 
nested={value1=second1, value2=second2, next=null}}}}])
{code}

Without Jonathan's patch, I get the following error in the front-end as 
expected:
{code}
LoadError: no such file to load -- json
  require at org/jruby/RubyKernel.java:1042
  require at 
file:/home/cheolsoo/pig-ruby/build/ivy/lib/Pig/jruby-complete-1.6.7.jar!/META-INF/jruby.home/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36
   (root) at test.rb:3
2012-10-18 17:09:24,323 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2999: Unexpected internal error. (LoadError) no such file to load -- json
{code}
I also ran the "Scripting" e2e test cases with the patch on a Hadoop-1.0.x 
cluster, and they all passed. So it seems good to commit to me.

Btw, I wanted to write an e2e test case using rubygems-json, but I realized 
that rubygems-json is under GPL and can't include in Pig. We should either find 
another rubygem library that is under the Apache licence or make the test 
configurable so that it will run only if rubygem-json is installed.

Thanks!
                
> SHIP and use JRuby gems in JRuby UDFs
> -------------------------------------
>
>                 Key: PIG-2927
>                 URL: https://issues.apache.org/jira/browse/PIG-2927
>             Project: Pig
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.11
>         Environment: JRuby UDFs
>            Reporter: Russell Jurney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, 
> PIG-2927-3.patch
>
>
> It would be great to use JRuby gems in JRuby UDFs without installing them on 
> all machines on the cluster. Some way to SHIP them automatically with the job 
> would be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to