Kevin Weil
Sun, 12 Oct 2008 03:26:22 -0700
I am having issues with a custom load function that reads protocol buffers. It worked with pig 0.1, and now after the refactoring to support 0.2/types, I can't get it to do anything past the line org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete My reader uses the same getNext() and bindTo() that it used before (modulo things like using the required TupleFactory now in getNext), and I now implement determineSchema as well. There's no global way to take a series of bytes from a protobuf and turn it into, say, a tuple, since it depends on the message, so I assert false in all the bytesTo* functions and then return a proper schema in determineSchema. Based on the document<http://wiki.apache.org/pig/TrunkToTypesChanges>sent out a couple days ago, I think this is correct, but stop me here if I'm wrong. A simplified version of my determineSchema is @Override public Schema determineSchema(URL fileName) throws IOException { try { List<Schema.FieldSchema> schemaList = new ArrayList<Schema.FieldSchema>(); schemaList.add(new FieldSchema("version", null, DataType.CHARARRAY)); return new Schema(schemaList); } catch(...) } The constructor of my LogReader class takes a string argument, and so a simple Pig script is <register jars> all_files = LOAD 'my_file' USING com.....logging.LogReader('client') AS (version: chararray); dump all_files; It's at this point that I get the MapReduceLauncher at 0% complete and it just sits there forever. Here is what I do see, in grunt. When I type the all_files = ... line, one of my LogReader classes gets instantiated with the correct string argument. When I type the dump line, another gets constructed with the correct string argument, then two LogReader classes get instantiated with a no-argument constructor, and then one more gets created with the correct string 1-argument constructor. This is all on a Linux setup with just one machine running in Hadoop mode to test. I don't understand why the two LogReaders get created with the zero-argument constructor here -- that shouldn't need to be defined, right? Perhaps it's a clue to what's going wrong. By the way, in my LogReader class, other than all the constructors getting called, no other function calls seem to happen. In particular, bindTo, getNext, and determineSchema are never called. Thanks in advance, Kevin p.s. In case it helps, the stacks at the point where the correct 1-argument constructor LogReader(String) is being instantiated is at com....logging.LogReader.<init>(LogReader.java:71) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:416) at org.apache.pig.impl.logicalLayer.LOLoad.<init>(LOLoad.java:64) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1106) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:889) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:748) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:549) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60) at org.apache.pig.PigServer.parseQuery(PigServer.java:295) at org.apache.pig.PigServer.clonePlan(PigServer.java:330) at org.apache.pig.PigServer.compileLp(PigServer.java:666) at org.apache.pig.PigServer.compileLp(PigServer.java:655) at org.apache.pig.PigServer.store(PigServer.java:433) at org.apache.pig.PigServer.store(PigServer.java:421) at org.apache.pig.PigServer.openIterator(PigServer.java:384) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58) at org.apache.pig.Main.main(Main.java:282) while the stack at the point where the incorrect zero-argument constructor gets called is at com.......logging.LogReader.<init>(LogReader.java:80) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:418) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:454) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.instantiateFunc(POCast.java:66) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.setLoadFSpec(POCast.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1153) at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:58) at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:27) at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:805) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:105) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232) at org.apache.pig.PigServer.compilePp(PigServer.java:731) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:644) at org.apache.pig.PigServer.store(PigServer.java:452) at org.apache.pig.PigServer.store(PigServer.java:421) at org.apache.pig.PigServer.openIterator(PigServer.java:384) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58) at org.apache.pig.Main.main(Main.java:282)