[ https://issues.apache.org/jira/browse/PIG-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Russell Jurney updated PIG-2440: -------------------------------- Description: I am creating Avro records according to the instructions/code at https://github.com/rjurney/Collecting-Data They look like this: { "namespace": "agile.data.avro", "name": "Email", "type": "record", "fields": [ {"name":"message_id", "type": ["string", "null"]}, {"name":"from","type": ["string", "null"]}, {"name":"to","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"reply_to", "type": [{"type":"array", "items":"string"}, "null"]}, {"name":"subject", "type": ["string", "null"]}, {"name":"body", "type": ["string", "null"]}, {"name":"date", "type": ["string", "null"]} ] } I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. I am running pig in local mode via: pig -l /tmp -x local -v The script is: REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar REGISTER /me/pig/contrib/piggybank/java/piggybank.jar REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage(); smaller = FOREACH messages GENERATE from, to; pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to; STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage(); 2011-12-20 17:58:25,705 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,719 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,722 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,737 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,740 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,751 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,755 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,757 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,760 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,762 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,766 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-12-20 17:58:25,804 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,808 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-12-20 17:58:25,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators. 2011-12-20 17:58:25,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2011-12-20 17:58:25,813 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,817 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,817 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-12-20 17:58:25,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-12-20 17:58:25,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job 2011-12-20 17:58:25,826 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-12-20 17:58:25,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-12-20 17:58:25,930 [Thread-22] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2011-12-20 17:58:26,327 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job. 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias pairs at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553) at org.apache.pig.PigServer.registerQuery(PigServer.java:541) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:523) at org.apache.pig.Main.main(Main.java:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311) at org.apache.pig.PigServer.launchPlan(PigServer.java:1271) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256) at org.apache.pig.PigServer.execute(PigServer.java:1246) at org.apache.pig.PigServer.access$400(PigServer.java:127) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548) ... 13 more Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) was: I am creating Avro records according to the instructions/code at https://github.com/rjurney/Collecting-Data They look like this: { "namespace": "agile.data.avro", "name": "Email", "type": "record", "fields": [ {"name":"message_id", "type": ["string", "null"]}, {"name":"from","type": ["string", "null"]}, {"name":"to","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"reply_to", "type": [{"type":"array", "items":"string"}, "null"]}, {"name":"subject", "type": ["string", "null"]}, {"name":"body", "type": ["string", "null"]}, {"name":"date", "type": ["string", "null"]} ] } I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. Tags: pig avro storage (was: pig) Labels: avro happy pants pig sad storage storefunc udf (was: ) I can't use AvroStorage :( > AvroStorage relations stop working after using DUMP > --------------------------------------------------- > > Key: PIG-2440 > URL: https://issues.apache.org/jira/browse/PIG-2440 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.1, 0.10, 0.9.2 > Environment: Mac OS X, running pig trunk > Reporter: Russell Jurney > Labels: avro, happy, pants, pig, sad, storage, storefunc, udf > Fix For: 0.9.1, 0.10, 0.9.2 > > > I am creating Avro records according to the instructions/code at > https://github.com/rjurney/Collecting-Data They look like this: > { > "namespace": "agile.data.avro", > "name": "Email", > "type": "record", > "fields": [ > {"name":"message_id", "type": ["string", "null"]}, > {"name":"from","type": ["string", "null"]}, > {"name":"to","type": [{"type":"array", "items":"string"}, > "null"]}, > {"name":"cc","type": [{"type":"array", "items":"string"}, > "null"]}, > {"name":"bcc","type": [{"type":"array", "items":"string"}, > "null"]}, > {"name":"reply_to", "type": [{"type":"array", "items":"string"}, > "null"]}, > {"name":"subject", "type": ["string", "null"]}, > {"name":"body", "type": ["string", "null"]}, > {"name":"date", "type": ["string", "null"]} > ] > } > I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. > I am running pig in local mode via: pig -l /tmp -x local -v > The script is: > REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar > REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar > REGISTER /me/pig/contrib/piggybank/java/piggybank.jar > REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar > REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar > DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); > messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage(); > smaller = FOREACH messages GENERATE from, to; > pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to; > STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage(); > 2011-12-20 17:58:25,705 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,719 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,722 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,737 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,740 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,751 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,755 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,757 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,760 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,762 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,766 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: > UNKNOWN > 2011-12-20 17:58:25,804 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,808 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,810 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - > File concatenation threshold: 100 optimistic? false > 2011-12-20 17:58:25,812 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 3 > 2011-12-20 17:58:25,813 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 map-only splittees. > 2011-12-20 17:58:25,813 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - Merged 1 out of total 3 MR operators. > 2011-12-20 17:58:25,813 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 2 > 2011-12-20 17:58:25,813 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,817 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,817 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to > the job > 2011-12-20 17:58:25,818 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2011-12-20 17:58:25,822 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up multi store job > 2011-12-20 17:58:25,826 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-12-20 17:58:25,826 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2011-12-20 17:58:25,930 [Thread-22] WARN org.apache.hadoop.mapred.JobClient > - No job jar file set. User classes may not be found. See JobConf(Class) or > JobConf#setJar(String). > 2011-12-20 17:58:26,327 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2117: Unexpected error when launching map reduce job. > 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias pairs > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553) > at org.apache.pig.PigServer.registerQuery(PigServer.java:541) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > at org.apache.pig.Main.run(Main.java:523) > at org.apache.pig.Main.main(Main.java:148) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: > Unexpected error when launching map reduce job. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1271) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256) > at org.apache.pig.PigServer.execute(PigServer.java:1246) > at org.apache.pig.PigServer.access$400(PigServer.java:127) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548) > ... 13 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira