pig-user  

question on load functions in the types branch

Kevin Weil
Sun, 12 Oct 2008 03:26:22 -0700

I am having issues with a custom load function that reads protocol buffers.
It worked with pig 0.1, and now after the refactoring to support 0.2/types,
I can't get it to do anything past the line

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete

My reader uses the same getNext() and bindTo() that it used before (modulo
things like using the required TupleFactory now in getNext), and I now
implement determineSchema as well.  There's no global way to take a series
of bytes from a protobuf and turn it into, say, a tuple, since it depends on
the message, so I assert false in all the bytesTo* functions and then return
a proper schema in determineSchema.  Based on the
document<http://wiki.apache.org/pig/TrunkToTypesChanges>sent out a
couple days ago, I think this is correct, but stop me here if I'm
wrong.

A simplified version of my determineSchema is

@Override public Schema determineSchema(URL fileName) throws IOException {
        try {
            List<Schema.FieldSchema> schemaList = new
ArrayList<Schema.FieldSchema>();
            schemaList.add(new FieldSchema("version", null,
DataType.CHARARRAY));
            return new Schema(schemaList);
        }
        catch(...)
}

The constructor of my LogReader class takes a string argument, and so a
simple Pig script is

<register jars>
all_files = LOAD 'my_file' USING com.....logging.LogReader('client') AS
(version: chararray);
dump all_files;

It's at this point that I get the MapReduceLauncher at 0% complete and it
just sits there forever.  Here is what I do see, in grunt.  When I type the
all_files = ... line, one of my LogReader classes gets instantiated with the
correct string argument.  When I type the dump line, another gets
constructed with the correct string argument, then two LogReader classes get
instantiated with a no-argument constructor, and then one more gets created
with the correct string 1-argument constructor.  This is all on a Linux
setup with just one machine running in Hadoop mode to test.  I don't
understand why the two LogReaders get created with the zero-argument
constructor here -- that shouldn't need to be defined, right?  Perhaps it's
a clue to what's going wrong.

By the way, in my LogReader class, other than all the constructors getting
called, no other function calls seem to happen.  In particular, bindTo,
getNext, and determineSchema are never called.

Thanks in advance,
Kevin

p.s. In case it helps, the stacks at the point where the correct 1-argument
constructor LogReader(String) is being instantiated is

    at com....logging.LogReader.<init>(LogReader.java:71)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:416)
    at org.apache.pig.impl.logicalLayer.LOLoad.<init>(LOLoad.java:64)
    at
org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1106)
    at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:889)
    at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:748)
    at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:549)
    at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
    at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
    at org.apache.pig.PigServer.clonePlan(PigServer.java:330)
    at org.apache.pig.PigServer.compileLp(PigServer.java:666)
    at org.apache.pig.PigServer.compileLp(PigServer.java:655)
    at org.apache.pig.PigServer.store(PigServer.java:433)
    at org.apache.pig.PigServer.store(PigServer.java:421)
    at org.apache.pig.PigServer.openIterator(PigServer.java:384)
    at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
    at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
    at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
    at org.apache.pig.Main.main(Main.java:282)

while the stack at the point where the incorrect zero-argument constructor
gets called is

    at com.......logging.LogReader.<init>(LogReader.java:80)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at java.lang.Class.newInstance0(Class.java:355)
    at java.lang.Class.newInstance(Class.java:308)
    at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:418)
    at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:454)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.instantiateFunc(POCast.java:66)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.setLoadFSpec(POCast.java:71)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1153)
    at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:58)
    at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:27)
    at
org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
    at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:805)
    at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:105)
    at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
    at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
    at org.apache.pig.PigServer.compilePp(PigServer.java:731)
    at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:644)
    at org.apache.pig.PigServer.store(PigServer.java:452)
    at org.apache.pig.PigServer.store(PigServer.java:421)
    at org.apache.pig.PigServer.openIterator(PigServer.java:384)
    at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
    at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
    at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
    at org.apache.pig.Main.main(Main.java:282)