[
https://issues.apache.org/jira/browse/PIG-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Russell Jurney resolved PIG-2440.
---------------------------------
Resolution: Invalid
I need to play with AvroStorage more to better characterize this bug before
reporting. Sometimes it works, sometimes it does not.
> AvroStorage won't store :(
> --------------------------
>
> Key: PIG-2440
> URL: https://issues.apache.org/jira/browse/PIG-2440
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.9.1, 0.10, 0.9.2
> Environment: Mac OS X, running pig trunk
> Reporter: Russell Jurney
> Labels: avro, happy, pants, pig, sad, storage, storefunc, udf
> Fix For: 0.9.1, 0.10, 0.9.2
>
> Attachments: test.pig
>
>
> I am creating Avro records according to the instructions/code at
> https://github.com/rjurney/Collecting-Data They look like this:
> {
> "namespace": "agile.data.avro",
> "name": "Email",
> "type": "record",
> "fields": [
> {"name":"message_id", "type": ["string", "null"]},
> {"name":"from","type": ["string", "null"]},
> {"name":"to","type": [{"type":"array", "items":"string"},
> "null"]},
> {"name":"cc","type": [{"type":"array", "items":"string"},
> "null"]},
> {"name":"bcc","type": [{"type":"array", "items":"string"},
> "null"]},
> {"name":"reply_to", "type": [{"type":"array", "items":"string"},
> "null"]},
> {"name":"subject", "type": ["string", "null"]},
> {"name":"body", "type": ["string", "null"]},
> {"name":"date", "type": ["string", "null"]}
> ]
> }
> I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays.
> I am running pig in local mode via: pig -l /tmp -x local -v
> The script is:
> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
> messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
> smaller = FOREACH messages GENERATE from, to;
> pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;
> STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();
> 2011-12-20 17:58:25,705 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,719 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,722 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,737 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,740 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,751 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,755 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,757 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,760 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,762 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,766 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script:
> UNKNOWN
> 2011-12-20 17:58:25,804 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,808 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,810 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2011-12-20 17:58:25,812 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 3
> 2011-12-20 17:58:25,813 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged 1 map-only splittees.
> 2011-12-20 17:58:25,813 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - Merged 1 out of total 3 MR operators.
> 2011-12-20 17:58:25,813 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 2
> 2011-12-20 17:58:25,813 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,817 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,817 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to
> the job
> 2011-12-20 17:58:25,818 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-12-20 17:58:25,822 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up multi store job
> 2011-12-20 17:58:25,826 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
> already initialized
> 2011-12-20 17:58:25,826 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-12-20 17:58:25,930 [Thread-22] WARN org.apache.hadoop.mapred.JobClient
> - No job jar file set. User classes may not be found. See JobConf(Class) or
> JobConf#setJar(String).
> 2011-12-20 17:58:26,327 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2117: Unexpected error when launching map reduce job.
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to
> store alias pairs
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
> at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:523)
> at org.apache.pig.Main.main(Main.java:148)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117:
> Unexpected error when launching map reduce job.
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
> at org.apache.pig.PigServer.execute(PigServer.java:1246)
> at org.apache.pig.PigServer.access$400(PigServer.java:127)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
> ... 13 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira