----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4651/#review6784 -----------------------------------------------------------
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java <https://reviews.apache.org/r/4651/#comment15009> instead of checking here if the tuple is generatable, the factory could default to the regular TupleFactory if the generation fails. trunk/src/org/apache/pig/data/SchemaTuple.java <https://reviews.apache.org/r/4651/#comment15010> How complex is this class ? Not sure it is worth pulling the whole mahout just for this. trunk/src/org/apache/pig/data/SchemaTuple.java <https://reviews.apache.org/r/4651/#comment15011> append...() methods should not be null (part of your TODO list?) trunk/src/org/apache/pig/data/SchemaTuple.java <https://reviews.apache.org/r/4651/#comment15013> handle the case where other == null Here you get NullPointerException trunk/src/org/apache/pig/data/SchemaTuple.java <https://reviews.apache.org/r/4651/#comment15014> same comment trunk/src/org/apache/pig/data/SchemaTuple.java <https://reviews.apache.org/r/4651/#comment15015> If you override equals(), you should override hashCode(). 2 object that are equal must return the same hashcode trunk/src/org/apache/pig/data/TupleFactory.java <https://reviews.apache.org/r/4651/#comment15052> that's where we need a proper classloader trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15016> try avoiding accessing the PigContext statically. Add it as a parameter and see if it can be passed from somewhere it is actually known. trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15017> possibly we want to rename this and/or add something else for this file. trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15018> ? either get rid of the empty catch block or document why it is Ok. Here I don't see why it would be Ok that we can not instantiate that class trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15019> "org.apache.pig.generated." + classname ? trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15020> "org/apache/pig/generated/" + classname + ".class" ? trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15021> we should probably not compile in the current dir. Put it in the temp dir intead trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15022> this could be calling a method in the parent trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15023> this could be in the parent trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15024> stuff which is not calling a generated field directly should be pulled up. trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15025> same comment trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15026> same trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15027> this is wasteful. Maybe DataByteArray could provide a static compare(byte[], byte[]) ? trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15028> duplicated code trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15029> this can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15030> the exception handling can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15031> "setPos_" + fieldPos + "((Boolean)val);" trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15032> same with a mapping for the java object for the type trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15033> can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15034> you can just remove the call to box, it should work trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15035> this can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15036> can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15037> can be pulled up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15038> pull up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15039> pull up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15040> pull up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15041> pull up trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java <https://reviews.apache.org/r/4651/#comment15042> pull up trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15043> Ok, temporary trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15044> do we need this. Let's just fall back to regular tuple when generation fails, so that we don't need to keep those in sync trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15045> The factory could be generated as well so that we don't need to use reflection here. Reflection is slower to create new instances. trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15046> The parameter type is enough to differentiate. getSchemaTupleFactory(Schema schema) trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15047> same trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15048> getTupleClass() trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15049> getTupleClass() trunk/src/org/apache/pig/data/SchemaTupleFactory.java <https://reviews.apache.org/r/4651/#comment15050> getTupleClass() trunk/src/org/apache/pig/data/Tuple.java <https://reviews.apache.org/r/4651/#comment15051> typo - Julien On 2012-04-08 22:26:29, Jonathan Coveney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4651/ > ----------------------------------------------------------- > > (Updated 2012-04-08 22:26:29) > > > Review request for pig and Julien Le Dem. > > > Summary > ------- > > This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing > the Schema on the frontend, we can code generate Tuples which can be used for > fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, > and it's ~15% smaller serialized (heavily heavily depends on the data, > though). Need to do get/set tests, but assuming that it's on par (or even > faster) than Tuple, the memory gain is huge. > > Need to clean up the code and add tests. > > Right now, it generates a SchemaTuple for every inputSchema and outputSchema > given to UDF's. The next step is to make a SchemaBag, where I think the > serialization savings will be really huge. > > Needs tests and comments, but I want the code to settle a bit. > > > This addresses bug PIG-2632. > https://issues.apache.org/jira/browse/PIG-2632 > > > Diffs > ----- > > trunk/bin/pig 1310666 > trunk/build.xml 1310666 > trunk/ivy.xml 1310666 > trunk/ivy/libraries.properties 1310666 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1310666 > trunk/src/org/apache/pig/data/BinInterSedes.java 1310666 > trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION > trunk/src/org/apache/pig/data/PrimitiveTuple.java 1310666 > trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION > trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION > trunk/src/org/apache/pig/data/Tuple.java 1310666 > trunk/src/org/apache/pig/data/TupleFactory.java 1310666 > trunk/src/org/apache/pig/data/TypeAwareTuple.java 1310666 > trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION > trunk/src/org/apache/pig/impl/PigContext.java 1310666 > trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java > 1310666 > > Diff: https://reviews.apache.org/r/4651/diff > > > Testing > ------- > > > Thanks, > > Jonathan > >
