> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > Thanks Jonathan for this contribution. A lot of good stuff in there.
> > Please add comments when using Ruby specific mechanisms in there (child 
> > classes notification, Class naming after the variable it is affected to, 
> > ...) so that less Ruby savvy contributors can follow.

I'm going to update the JIRA with some of the broader comments.

My todo at this point is:
- Polish up the Javadocs (I should probably add @params and whatnot)
- Clean up the naming convention in the Ruby API defined in Java
- Add more test coverage (I added a couple big features, they need to be used 
more in the e2e tests especially)
- Add varargs support

That said, the tests pass and it seems to work. The API feels quite clean to me.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java, 
> > line 41
> > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line41>
> >
> >     ruby.getClass("DataBag") could be done only once

It could, but there is a caveat: the Ruby runtime HAS to be identical within 
calls, or you're going to run into huge issues. The getClass call isn't 
particularly heinous, and I think it's cleaner to leave in. For the time being, 
all of the Ruby runtimes are identical, but in the future there may be multiple 
(a different runtime per UDF or per script registered), so this is safer.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/pigudf.rb, line 110
> > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line110>
> >
> >     You could call it RegisterDescendentsUDF or a similar name as this is a 
> > based class for UDFs.
> >     The descendents registration is common to the PigUDF as well. Do you 
> > want to unify the class hierarchy?

Agreed, I renamed it


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/pigudf.rb, line 87
> > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line87>
> >
> >     I would add a separate field for schema func instead of this convention.

I split it out into another class like you suggested


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/pigudf.rb, line 41
> > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line41>
> >
> >     - please add a comment explain the mechanism that names the class after 
> > the variable it is assigned to.

added a ton of comments


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/pigudf.rb, line 26
> > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line26>
> >
> >     you could define a class for this

Agreed


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/pigudf.rb, line 6
> > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line6>
> >
> >     maybe you need to separate the top level ruby script to be used outside 
> > of pig and the one used inside using includes?

I was able to clean this up significantly in multiple ways. Will document how 
in the JIRA.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java, 
> > line 65
> > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line65>
> >
> >     I would override OutputSchema instead. In the case of a bag you are 
> > losing the schema of the Tuple inside the bag.

Wisdom of the ages. Done :)


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java, line 
> > 86
> > <https://reviews.apache.org/r/4091/diff/1/?file=86325#file86325line86>
> >
> >     some of the ruby calling logic and exception catching could be factored 
> > in this class

Agreed...the win felt small, I will take this into account, but it's not a 
priority


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java, line 60
> > <https://reviews.apache.org/r/4091/diff/1/?file=86326#file86326line60>
> >
> >     check for the varargs case

I'll make varargs a top priority for the next round of changes... didn't seem 
super pressing atm, but maybe it is.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java, line 15
> > <https://reviews.apache.org/r/4091/diff/1/?file=86327#file86327line15>
> >
> >     It is very similar to JrubyEvalFunc. You can factor out more of this 
> > code.

I suppose this begs the question of why we even have FilterFunc anymore, given 
that it just extends EvalFunc<Boolean> now. I treated it separately because I 
wasn't sure about that (I could just make FilterFuncs EvalFunc<Boolean>s and be 
done with it)... it seems like something we should change in Pig (ie abolish 
FilterFuncs). But if FilterFunc is treater separately, I want the two 
classes... I suppose I could pull out the functionality, and should, but I'd 
like to have some resolution on that, because it'll guide how I do so.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java, line 107
> > <https://reviews.apache.org/r/4091/diff/1/?file=86328#file86328line107>
> >
> >     this will strip the inner schema of bags and tuples. See comments about 
> > overriding outputSchema

Can you avoid this with Algebraic UDFs? Fixed for Accumulators


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java, line 67
> > <https://reviews.apache.org/r/4091/diff/1/?file=86331#file86331line67>
> >
> >     please make it private and at the top

I don't even ned it, I think


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 40
> > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line40>
> >
> >     make it private
> >     1 is a perfectly good value. it does not need to be unique.
> >     Changing the serialVersionUID value will make the serialized data 
> > format incompatible.

you're correct. I think this was an artifact from a previous implementation and 
isn't actually necessary. good to know, though...


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 104
> > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line104>
> >
> >     in this case addAll would replace the current content. Is it intended ?

Good call. It was a bug. I removed addAll and just made the functionality a 
part of add.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 149
> > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line149>
> >
> >     you could call runtime.getClass("DataBag").getClass("BagIterator") once 
> > at the beginning

I eliminated BagIterator, it didn't seem necessary. That said, re: the getClass 
calls, see above.


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java, line 98
> > <https://reviews.apache.org/r/4091/diff/1/?file=86333#file86333line98>
> >
> >     give meaningful names to parameters

the issue with ruby parameters is that a lot of the functions can take multiple 
types, etc, and the meaning changes depending. I could handle this a couple of 
ways.

1) in the case where there is only one option, use that
2) make better use of @param javadoc
3) in the case where there are multiple possibilities, make sure to explicitly 
create an object that is named to reflect what it is supposed to be

Would appreciate your thoughts on the matter


> On 2012-03-03 22:26:58, Julien Le Dem wrote:
> > trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb, line 32
> > <https://reviews.apache.org/r/4091/diff/1/?file=86341#file86341line32>
> >
> >     What about keeping only the one parameter ouptutSchema and 
> > outputSchemaFunctions?
> >     Users would just use the function right before defining the UDF.
> >     
> >       outputSchema "word:chararray"
> >       def concat input, input2
> >         return input + input2
> >       end

it'd be good to test both, agreed


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4091/#review5601
-----------------------------------------------------------


On 2012-02-28 22:02:52, Jonathan Coveney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4091/
> -----------------------------------------------------------
> 
> (Updated 2012-02-28 22:02:52)
> 
> 
> Review request for pig, Julien Le Dem and Alan Gates.
> 
> 
> Summary
> -------
> 
> This is the reviewboard for the following:
> 
> https://issues.apache.org/jira/browse/PIG-2317
> 
> 
> This addresses bug PIG-2317.
>     https://issues.apache.org/jira/browse/PIG-2317
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION 
>   trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java 
> PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java 
> PRE-CREATION 
>   trunk/bin/pig 1294756 
>   trunk/ivy.xml 1294756 
>   trunk/ivy/libraries.properties 1294756 
>   trunk/pigudf.rb PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java 
> PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java 
> PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION 
>   trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION 
>   trunk/test/e2e/pig/build.xml 1294756 
>   trunk/test/e2e/pig/conf/default.conf 1294756 
>   trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756 
>   trunk/test/e2e/pig/tests/nightly.conf 1294756 
>   trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION 
>   trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION 
>   trunk/test/org/apache/pig/test/TestUDF.java 1294756 
>   trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4091/diff
> 
> 
> Testing
> -------
> 
> e2e tests can be run as follows:
> 
> ant -Dharness.hadoop.home=<path_to_hadoop> 
> -Dharness.old.pig=<path_to_old_pig> test-e2e-deploy-local
> ant -Dharness.hadoop.home=<path_to_hadoop> 
> -Dharness.old.pig=<path_to_old_pig> -Dtests.to.run="-t RubyUDFs" 
> test-e2e-local
> 
> 
> Thanks,
> 
> Jonathan
> 
>

Reply via email to