[jira] [Commented] (PIG-2792) Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)

Russell Jurney (JIRA) Fri, 06 Jul 2012 17:23:36 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408478#comment-13408478
 ]


Russell Jurney commented on PIG-2792:
-------------------------------------

The properties that need to be set in the Hadoop configuration object are:

           Instantiates a new RecordWriter for Elasticsearch
           <p>
           The properties that <b>MUST</b> be set in the hadoop Configuration 
object
           are as follows:
           <ul>
           <li><b>elasticsearch.index.name</b> - The name of the elasticsearch 
index data will be written to. It does not have to exist ahead of time</li>
           <li><b>elasticsearch.bulk.size</b> - The number of records to be 
accumulated into a bulk request before writing to elasticsearch.</li>
           <li><b>elasticsearch.is_json</b> - A boolean indicating whether the 
records to be indexed are json records. If false the records are assumed to be 
tsv, in which case <b>elasticsearch.field.names</b> must be set and contain a 
comma separated list of field names</li>
           <li><b>elasticsearch.object.type</b> - The type of objects being 
indexed</li>
           <li><b>elasticsearch.config</b> - The full path the 
elasticsearch.yml. It is a local path and must exist on all machines in the 
hadoop cluster.</li>
           <li><b>elasticsearch.plugins.dir</b> - The full path the 
elasticsearch plugins directory. It is a local path and must exist on all 
machines in the hadoop cluster.</li>
           </ul>
           <p>
           The following fields depend on whether <b>elasticsearch.is_json</b> 
is true or false.
           <ul>
           <li><b>elasticsearch.id.field.name</b> - When 
<b>elasticsearch.is_json</b> is true, this is the name of a field in the json 
document that contains the document's id. If -1 is used then the document is 
assumed to have no id and one is assigned to it by elasticsearch.</li>
           <li><b>elasticsearch.field.names</b> - When 
<b>elasticsearch.is_json</b> is false, this is a comma separated list of field 
names.</li>
           <li><b>elasticsearch.id.field</b> - When 
<b>elasticsearch.is_json</b> is false, this is the numeric index of the field 
to use as the document id. If -1 is used the document is assumed to have no id 
and one is assigned to it by elasticsearch.</li>
           </ul>       

                
> Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)
> ---------------------------------------------------------
>
>                 Key: PIG-2792
>                 URL: https://issues.apache.org/jira/browse/PIG-2792
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11, 0.10.1
>         Environment: Pig with Wonderdog 
> https://github.com/infochimps-labs/wonderdog for elasticsearch integration. 
> Elasticsearch 0.18.6. Pig local mode.
>            Reporter: Russell Jurney
>            Priority: Blocker
>              Labels: a, about, area, book, did, i, moving, of, omg, 
> technology, why, write
>             Fix For: 0.10.1
>
>
> The Pig UDFs in Wonderdog for ElasticSearch integration, which worked in 
> 0.9.2 stopped working in 0.10.0.
> Now in 0.10.0 there is an error, as Wonderdog is unable to read its 
> configuration from the hadoop cache.
> If someone can help identify what the issue is, or advise how Wonderdog or 
> Pig can be modified so that wonderdog works with with Pig 0.10, it would be 
> greatly appreciated.
> This issue is duped in the Wonderdog project here: 
> https://github.com/infochimps-labs/wonderdog/issues/6 
> https://github.com/infochimps-labs/wonderdog/issues/5 and 
> https://github.com/infochimps-labs/wonderdog/issues/7
> The error is below:
> 2012-07-06 16:50:51,501 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.0-SNAPSHOT (rexported) compiled Jun 22 2012, 15:56:16
> 2012-07-06 16:50:51,502 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /private/tmp/pig_1341618651472.log
> 2012-07-06 16:50:51,829 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> {"ok":true}
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  
> Current
>                                  Dload  Upload   Total   Spent    Left  Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> 100    11  100    11    0     0    647      0 --:--:-- --:--:-- --:--:--   733
> 2012-07-06 16:50:53,206 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: 
> UNKNOWN
> 2012-07-06 16:50:53,379 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2012-07-06 16:50:53,403 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2012-07-06 16:50:53,403 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2012-07-06 16:50:53,441 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2012-07-06 16:50:53,449 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-07-06 16:50:53,494 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - Setting up single store job
> 2012-07-06 16:50:53,560 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 1 map-reduce job(s) waiting for submission.
> 2012-07-06 16:50:53,587 [Thread-7] WARN  
> org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2012-07-06 16:50:53,597 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
> No job jar file set.  User classes may not be found. See JobConf(Class) or 
> JobConf#setJar(String).
> ****file:/tmp/emails.json
> 2012-07-06 16:50:53,711 [Thread-7] INFO  
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
> process : 2
> 2012-07-06 16:50:53,711 [Thread-7] INFO  
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
> paths to process : 2
> 2012-07-06 16:50:53,734 [Thread-7] WARN  
> org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not 
> loaded
> 2012-07-06 16:50:53,737 [Thread-7] INFO  
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
> paths (combined) to process : 3
> 2012-07-06 16:50:54,008 [Thread-8] INFO  org.apache.hadoop.mapred.Task -  
> Using ResourceCalculatorPlugin : null
> 2012-07-06 16:50:54,023 [Thread-8] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader 
> - Current split being processed file:/tmp/emails.json/part-m-00000:0+33554432
> 2012-07-06 16:50:54,029 [Thread-8] INFO  
> com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using 
> field:[message_id] for document ids
> 2012-07-06 16:50:54,029 [Thread-8] INFO  
> com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as 
> es.config
> 2012-07-06 16:50:54,029 [Thread-8] INFO  
> com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as 
> es.plugins.dir
> 2012-07-06 16:50:54,033 [Thread-8] WARN  
> org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
> 2012-07-06 16:50:54,034 [Thread-8] WARN  
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: java.lang.NullPointerException
>       at 
> com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:133)
>       at 
> com.infochimps.elasticsearch.ElasticSearchOutputFormat.getRecordWriter(ElasticSearchOutputFormat.java:262)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: java.lang.NullPointerException
>       at java.util.Hashtable.put(Hashtable.java:394)
>       at java.util.Properties.setProperty(Properties.java:143)
>       at java.lang.System.setProperty(System.java:746)
>       at 
> com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:130)
>       ... 6 more
> 2012-07-06 16:50:54,506 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - HadoopJobId: job_local_0001
> 2012-07-06 16:50:54,506 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2012-07-06 16:50:59,022 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - job job_local_0001 has failed! Stop running all dependent jobs
> 2012-07-06 16:50:59,023 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2012-07-06 16:50:59,024 [main] ERROR 
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2012-07-06 16:50:59,024 [main] INFO  
> org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats 
> reported below may be incomplete
> 2012-07-06 16:50:59,025 [main] INFO  
> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 
> HadoopVersion PigVersion      UserId  StartedAt       FinishedAt      Features
> 1.0.2 0.10.0-SNAPSHOT rjurney 2012-07-06 16:50:53     2012-07-06 16:50:59     
> UNKNOWN
> Failed!
> Failed Jobs:
> JobId Alias   Feature Message Outputs
> job_local_0001        json_emails     MAP_ONLY        Message: Job failed! 
> Error - NA es://email/email?id=message_id&json=true&size=1000,
> Input(s):
> Failed to read data from "/tmp/emails.json"
> Output(s):
> Failed to produce result in 
> "es://email/email?id=message_id&json=true&size=1000"
> Job DAG:
> job_local_0001
> 2012-07-06 16:50:59,025 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed!
> 2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> ERROR 2244: Job failed, hadoop does not return any error message
> 2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, 
> hadoop does not return any error message
>       at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
>       at 
> org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1025)
>       at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:167)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>       at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>       at org.apache.pig.Main.run(Main.java:555)
>       at org.apache.pig.Main.main(Main.java:111)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Details also at logfile: /private/tmp/pig_1341618651472.log
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  
> Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> {
>   "took" : 75,
>   "timed_out" : false,
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     
> 0  "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 0,
>     "max_score" : null,
>     "hits" : [ ]
>   }
> }
> 100   193  100   193    0     0   2475      0 --:--:-- --:--:-- --:--:--  2539
> 2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> ERROR 2244: Job failed, hadoop does not return any error message
> 2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, 
> hadoop does not return any error message
>       at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>       at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>       at org.apache.pig.Main.run(Main.java:555)
>       at org.apache.pig.Main.main(Main.java:111)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2792) Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)

Reply via email to