Russell Jurney created PIG-2792:
-----------------------------------
Summary: Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)
Key: PIG-2792
URL: https://issues.apache.org/jira/browse/PIG-2792
Project: Pig
Issue Type: Bug
Components: piggybank
Affects Versions: 0.10.0, 0.11, 0.10.1
Environment: Pig with Wonderdog
https://github.com/infochimps-labs/wonderdog for elasticsearch integration.
Elasticsearch 0.18.6. Pig local mode.
Reporter: Russell Jurney
Priority: Blocker
Fix For: 0.10.1
The Pig UDFs in Wonderdog for ElasticSearch integration, which worked in 0.9.2
stopped working in 0.10.0.
Now in 0.10.0 there is an error, as Wonderdog is unable to read its
configuration from the hadoop cache.
If someone can help identify what the issue is, or advise how Wonderdog or Pig
can be modified so that wonderdog works with with Pig 0.10, it would be greatly
appreciated.
This issue is duped in the Wonderdog project here:
https://github.com/infochimps-labs/wonderdog/issues/6
https://github.com/infochimps-labs/wonderdog/issues/5 and
https://github.com/infochimps-labs/wonderdog/issues/7
The error is below:
2012-07-06 16:50:51,501 [main] INFO org.apache.pig.Main - Apache Pig version
0.10.0-SNAPSHOT (rexported) compiled Jun 22 2012, 15:56:16
2012-07-06 16:50:51,502 [main] INFO org.apache.pig.Main - Logging error
messages to: /private/tmp/pig_1341618651472.log
2012-07-06 16:50:51,829 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
{"ok":true}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 11 100 11 0 0 647 0 --:--:-- --:--:-- --:--:-- 733
2012-07-06 16:50:53,206 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: UNKNOWN
2012-07-06 16:50:53,379 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File
concatenation threshold: 100 optimistic? false
2012-07-06 16:50:53,403 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-07-06 16:50:53,403 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-07-06 16:50:53,441 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig script settings are added to the job
2012-07-06 16:50:53,449 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 16:50:53,494 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-07-06 16:50:53,560 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-07-06 16:50:53,587 [Thread-7] WARN
org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2012-07-06 16:50:53,597 [Thread-7] WARN org.apache.hadoop.mapred.JobClient -
No job jar file set. User classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
****file:/tmp/emails.json
2012-07-06 16:50:53,711 [Thread-7] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 2
2012-07-06 16:50:53,711 [Thread-7] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 2
2012-07-06 16:50:53,734 [Thread-7] WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not
loaded
2012-07-06 16:50:53,737 [Thread-7] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 3
2012-07-06 16:50:54,008 [Thread-8] INFO org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin : null
2012-07-06 16:50:54,023 [Thread-8] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader -
Current split being processed file:/tmp/emails.json/part-m-00000:0+33554432
2012-07-06 16:50:54,029 [Thread-8] INFO
com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using
field:[message_id] for document ids
2012-07-06 16:50:54,029 [Thread-8] INFO
com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as
es.config
2012-07-06 16:50:54,029 [Thread-8] INFO
com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as
es.plugins.dir
2012-07-06 16:50:54,033 [Thread-8] WARN
org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
2012-07-06 16:50:54,034 [Thread-8] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: java.lang.NullPointerException
at
com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:133)
at
com.infochimps.elasticsearch.ElasticSearchOutputFormat.getRecordWriter(ElasticSearchOutputFormat.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:394)
at java.util.Properties.setProperty(Properties.java:143)
at java.lang.System.setProperty(System.java:746)
at
com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:130)
... 6 more
2012-07-06 16:50:54,506 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
2012-07-06 16:50:54,506 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-07-06 16:50:59,022 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs
2012-07-06 16:50:59,023 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-07-06 16:50:59,024 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil
- 1 map reduce job(s) failed!
2012-07-06 16:50:59,024 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats
reported below may be incomplete
2012-07-06 16:50:59,025 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.2 0.10.0-SNAPSHOT rjurney 2012-07-06 16:50:53 2012-07-06 16:50:59
UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0001 json_emails MAP_ONLY Message: Job failed! Error - NA
es://email/email?id=message_id&json=true&size=1000,
Input(s):
Failed to read data from "/tmp/emails.json"
Output(s):
Failed to produce result in "es://email/email?id=message_id&json=true&size=1000"
Job DAG:
job_local_0001
2012-07-06 16:50:59,025 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
ERROR 2244: Job failed, hadoop does not return any error message
2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed,
hadoop does not return any error message
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1025)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:167)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Details also at logfile: /private/tmp/pig_1341618651472.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
{
"took" : 75,
"timed_out" : false,
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
100 193 100 193 0 0 2475 0 --:--:-- --:--:-- --:--:-- 2539
2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
ERROR 2244: Job failed, hadoop does not return any error message
2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed,
hadoop does not return any error message
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira