Re: [Hadoop] storing data in ES using pig script

Costin Leau Mon, 14 Apr 2014 01:33:29 -0700

Since you are not specifying the network configuration for an elasticsearch
node, it will default to localhost:9200. This works as long as you are
running Hadoop (Pig, Hive, Cascading, etc...) on the same machine as
Elasticsearch - based on your exception that is unlikely the case.
Try specifying the `es.nodes` parameter - see the documentation for more
information.


Additionally, you seem to be using the wrong jar of es-hadoop - in your
script you are registering es-hadoop-1.2.0.jar (which does not support the
pig/hive/cascading functionality) while the stacktrace indicate you are
using es-hadoop-1.3.X.jar.

Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
available in Maven Central) and no other version. I recommend starting with
the examples in the reference docs, which show to easily load and store
data to/from Elasticsearch.
Once that works, consider extending your script.

Hope this helps,


On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[email protected]>wrote:

> Hi,
>
> Here is my log and my script Pig
>
> log file :
> Backend error message
> ---------------------
> java.io.IOException: java.io.IOException: Out of nodes and retries; caught
> exception
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: Out of nodes and retries; caught exception
>     at
> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
>     at
> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
>     at
> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
>     at
> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
>     at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
>     at
> org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
>     at
> org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
>     at
> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
>     at
> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
>     at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>     at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
>     at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
>     ... 11 more
> Caused by: java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>     at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>     at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>     at java.net.Socket.connect(Socket.java:579)
>     at java.net.Socket.connect(Socket.java:528)
>     at java.net.Socket.<init>(Socket.java:425)
>     at java.net.Socket.<init>(Socket.java:280)
>     at
> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
>     at
> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
>     at
> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
>     at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
>     at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
>     at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
>     at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
>     at
> org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
>     at
> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
>     ... 25 more
>
> Pig script:
> REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
> weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
> AS (client_ip : chararray,
> full_request_date : chararray,
> day : int,
> month : chararray,
> month_num : int,
> year : int,
> hour : int,
> minute : int,
> second : int,
> timezone : chararray,
> http_verb : chararray,
> uri : chararray,
> http_status_code : chararray,
> bytes_returned : chararray,
> referrer : chararray,
> user_agent : chararray
> );
> weblog_group = GROUP weblogs by (client_ip, year, month_num);
> weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year,
> group.month_num,  COUNT_STAR(weblogs) as pageviews;
>
> STORE weblog_count INTO 'weblogs2/logs2' USING
> org.elasticsearch.hadoop.pig.EsStorage();
>
> And what ever I put in the LOAD it gives me the same result,even if I put
> the path of me desktop
>
> Thx
>
> Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
>>
>> Hi,
>>
>> That isn't a lot of information so it's hard to figure out what's
>> actually work - one can only guess. Can you post your
>> stacktrace/logs and your script pig somewhere - like a gist?
>>
>> One thing that stands out is that you mention you are using Pig yet your
>> path points to a Hive warehouse:
>> > /Failed to read data from "/user/hive/warehouse/books"/
>>
>> I can infer from this that maybe, the issue, is the fact that you are
>> trying to read a Hive internal file, which Pig
>> can't understand, leading to the error that you see.
>>
>> Cheers,
>>
>>
>> On 4/14/14 1:23 AM, hanine haninne wrote:
>> > Hello ,
>> >
>> > I m trying to store data in ES (head) using pig script and it gives me
>> >
>> > /Input(s):/
>> > /Failed to read data from "/user/hive/warehouse/books"/
>> >
>> > /Output(s):/
>> > /Failed to produce result in "books/book"/
>> >
>> > I ll be so thankful if someone would like help me
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to
>> > [email protected] <mailto:elasticsearch+
>> [email protected]>.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msgid/elasticsearch/979f5688-
>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
>> > <https://groups.google.com/d/msgid/elasticsearch/979f5688-
>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
>> email&utm_source=footer>.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Costin
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Hadoop] storing data in ES using pig script

Reply via email to