[jira] [Updated] (PIG-2440) AvroStorage relations stop working after using DUMP

Russell Jurney (Updated) (JIRA) Tue, 20 Dec 2011 18:02:01 -0800

     [ 
https://issues.apache.org/jira/browse/PIG-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Russell Jurney updated PIG-2440:
--------------------------------

    Description: 
I am creating Avro records according to the instructions/code at 
https://github.com/rjurney/Collecting-Data  They look like this:

    {
        "namespace": "agile.data.avro",
        "name": "Email",
        "type": "record",
        "fields": [
            {"name":"message_id", "type": ["string", "null"]},
            {"name":"from","type": ["string", "null"]},
            {"name":"to","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"reply_to", "type": [{"type":"array", "items":"string"}, 
"null"]},
            {"name":"subject", "type": ["string", "null"]},
            {"name":"body", "type": ["string", "null"]},
            {"name":"date", "type": ["string", "null"]}
        ]
    }

I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays.  
I am running pig in local mode via: pig -l /tmp -x local -v

The script is:

REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
smaller = FOREACH messages GENERATE from, to;
pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;

STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();


2011-12-20 17:58:25,705 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,719 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,722 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,737 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,740 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,751 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,755 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,757 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,760 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,762 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,766 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig features used in the script: UNKNOWN
2011-12-20 17:58:25,804 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,808 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,810 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File 
concatenation threshold: 100 optimistic? false
2011-12-20 17:58:25,812 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 3
2011-12-20 17:58:25,813 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - Merged 1 map-only splittees.
2011-12-20 17:58:25,813 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - Merged 1 out of total 3 MR operators.
2011-12-20 17:58:25,813 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 2
2011-12-20 17:58:25,813 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,817 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,817 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig script settings are added to the job
2011-12-20 17:58:25,818 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-12-20 17:58:25,822 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting up multi store job
2011-12-20 17:58:25,826 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2011-12-20 17:58:25,826 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 1 map-reduce job(s) waiting for submission.
2011-12-20 17:58:25,930 [Thread-22] WARN  org.apache.hadoop.mapred.JobClient - 
No job jar file set.  User classes may not be found. See JobConf(Class) or 
JobConf#setJar(String).
2011-12-20 17:58:26,327 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2117: Unexpected error when launching map reduce job.
2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store 
alias pairs
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
        at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:523)
        at org.apache.pig.Main.main(Main.java:148)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: 
Unexpected error when launching map reduce job.
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
        at org.apache.pig.PigServer.execute(PigServer.java:1246)
        at org.apache.pig.PigServer.access$400(PigServer.java:127)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
        ... 13 more
Caused by: java.lang.NullPointerException
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)


  was:
I am creating Avro records according to the instructions/code at 
https://github.com/rjurney/Collecting-Data  They look like this:

    {
        "namespace": "agile.data.avro",
        "name": "Email",
        "type": "record",
        "fields": [
            {"name":"message_id", "type": ["string", "null"]},
            {"name":"from","type": ["string", "null"]},
            {"name":"to","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]},
            {"name":"reply_to", "type": [{"type":"array", "items":"string"}, 
"null"]},
            {"name":"subject", "type": ["string", "null"]},
            {"name":"body", "type": ["string", "null"]},
            {"name":"date", "type": ["string", "null"]}
        ]
    }

I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays.






           Tags: pig avro storage  (was: pig)
         Labels: avro happy pants pig sad storage storefunc udf  (was: )

I can't use AvroStorage :(
                
> AvroStorage relations stop working after using DUMP
> ---------------------------------------------------
>
>                 Key: PIG-2440
>                 URL: https://issues.apache.org/jira/browse/PIG-2440
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10, 0.9.2
>         Environment: Mac OS X, running pig trunk
>            Reporter: Russell Jurney
>              Labels: avro, happy, pants, pig, sad, storage, storefunc, udf
>             Fix For: 0.9.1, 0.10, 0.9.2
>
>
> I am creating Avro records according to the instructions/code at 
> https://github.com/rjurney/Collecting-Data  They look like this:
>     {
>         "namespace": "agile.data.avro",
>         "name": "Email",
>         "type": "record",
>         "fields": [
>             {"name":"message_id", "type": ["string", "null"]},
>             {"name":"from","type": ["string", "null"]},
>             {"name":"to","type": [{"type":"array", "items":"string"}, 
> "null"]},
>             {"name":"cc","type": [{"type":"array", "items":"string"}, 
> "null"]},
>             {"name":"bcc","type": [{"type":"array", "items":"string"}, 
> "null"]},
>             {"name":"reply_to", "type": [{"type":"array", "items":"string"}, 
> "null"]},
>             {"name":"subject", "type": ["string", "null"]},
>             {"name":"body", "type": ["string", "null"]},
>             {"name":"date", "type": ["string", "null"]}
>         ]
>     }
> I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays. 
>  I am running pig in local mode via: pig -l /tmp -x local -v
> The script is:
> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
> messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
> smaller = FOREACH messages GENERATE from, to;
> pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;
> STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();
> 2011-12-20 17:58:25,705 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,719 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,722 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,737 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,740 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,751 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,755 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,757 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,760 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,762 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,766 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: 
> UNKNOWN
> 2011-12-20 17:58:25,804 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,808 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,810 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2011-12-20 17:58:25,812 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 3
> 2011-12-20 17:58:25,813 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - Merged 1 map-only splittees.
> 2011-12-20 17:58:25,813 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - Merged 1 out of total 3 MR operators.
> 2011-12-20 17:58:25,813 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 2
> 2011-12-20 17:58:25,813 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,817 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,817 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2011-12-20 17:58:25,818 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-12-20 17:58:25,822 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - Setting up multi store job
> 2011-12-20 17:58:25,826 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized
> 2011-12-20 17:58:25,826 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 1 map-reduce job(s) waiting for submission.
> 2011-12-20 17:58:25,930 [Thread-22] WARN  org.apache.hadoop.mapred.JobClient 
> - No job jar file set.  User classes may not be found. See JobConf(Class) or 
> JobConf#setJar(String).
> 2011-12-20 17:58:26,327 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2117: Unexpected error when launching map reduce job.
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias pairs
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
>       at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
>       at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>       at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>       at org.apache.pig.Main.run(Main.java:523)
>       at org.apache.pig.Main.main(Main.java:148)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: 
> Unexpected error when launching map reduce job.
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
>       at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
>       at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
>       at org.apache.pig.PigServer.execute(PigServer.java:1246)
>       at org.apache.pig.PigServer.access$400(PigServer.java:127)
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
>       ... 13 more
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
>       at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>       at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>       at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>       at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2440) AvroStorage relations stop working after using DUMP

Reply via email to