Re: [Hadoop] storing data in ES using pig script

hanine haninne Tue, 15 Apr 2014 02:11:32 -0700

Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me 

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201404142111_0008    weblog_count,weblog_group,weblogs   
 GROUP_BY,COMBINER    Message: Job failed! Error - # of failed Reduce Tasks 
exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201404142111_0008_r_000000    weblogs/logs2,


Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
 

I think it s better to know how things works from the beginning,so pls 
would u like to tell what I have to do (what should I start with) what 
should I do to configure elasticsearch (head) with Hadoop and how can I 
work with elasticsearch head 

Thank you so much ,really all what u say is so helpful .Thank you

Le lundi 14 avril 2014 09:33:12 UTC+1, Costin Leau a écrit :
>
> Since you are not specifying the network configuration for an 
> elasticsearch node, it will default to localhost:9200. This works as long 
> as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same 
> machine as Elasticsearch - based on your exception that is unlikely the 
> case.
> Try specifying the `es.nodes` parameter - see the documentation for more 
> information.
>
> Additionally, you seem to be using the wrong jar of es-hadoop - in your 
> script you are registering es-hadoop-1.2.0.jar (which does not support the 
> pig/hive/cascading functionality) while the stacktrace indicate you are 
> using es-hadoop-1.3.X.jar. 
>
> Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and 
> available in Maven Central) and no other version. I recommend starting with 
> the examples in the reference docs, which show to easily load and store 
> data to/from Elasticsearch.
> Once that works, consider extending your script.
>
> Hope this helps,
>
>
> On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne 
> <[email protected]<javascript:>
> > wrote:
>
>> Hi,
>>
>> Here is my log and my script Pig 
>>
>> log file :
>> Backend error message
>> ---------------------
>> java.io.IOException: java.io.IOException: Out of nodes and retries; 
>> caught exception
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
>>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>     at 
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: java.io.IOException: Out of nodes and retries; caught exception
>>     at 
>> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
>>     at 
>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
>>     at 
>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
>>     at 
>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
>>     at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
>>     at 
>> org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
>>     at 
>> org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
>>     at 
>> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
>>     at 
>> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
>>     at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>     at 
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
>>     at 
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>     at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
>>     ... 11 more
>> Caused by: java.net.ConnectException: Connection refused
>>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>>     at 
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>     at 
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>     at 
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>     at java.net.Socket.connect(Socket.java:579)
>>     at java.net.Socket.connect(Socket.java:528)
>>     at java.net.Socket.<init>(Socket.java:425)
>>     at java.net.Socket.<init>(Socket.java:280)
>>     at 
>> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
>>     at 
>> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
>>     at 
>> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
>>     at 
>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
>>     at 
>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
>>     at 
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
>>     at 
>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
>>     at 
>> org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
>>     at 
>> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
>>     ... 25 more
>>
>> Pig script:
>> REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
>> weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
>> AS (client_ip : chararray,
>> full_request_date : chararray,
>> day : int,
>> month : chararray,
>> month_num : int,
>> year : int,
>> hour : int,
>> minute : int,
>> second : int,
>> timezone : chararray,
>> http_verb : chararray,
>> uri : chararray,
>> http_status_code : chararray,
>> bytes_returned : chararray,
>> referrer : chararray,
>> user_agent : chararray
>> );
>> weblog_group = GROUP weblogs by (client_ip, year, month_num);
>> weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, 
>> group.month_num,  COUNT_STAR(weblogs) as pageviews;
>>
>> STORE weblog_count INTO 'weblogs2/logs2' USING 
>> org.elasticsearch.hadoop.pig.EsStorage();
>>
>> And what ever I put in the LOAD it gives me the same result,even if I put 
>> the path of me desktop
>>
>> Thx
>>
>> Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
>>>
>>> Hi, 
>>>
>>> That isn't a lot of information so it's hard to figure out what's 
>>> actually work - one can only guess. Can you post your 
>>> stacktrace/logs and your script pig somewhere - like a gist? 
>>>
>>> One thing that stands out is that you mention you are using Pig yet your 
>>> path points to a Hive warehouse: 
>>> > /Failed to read data from "/user/hive/warehouse/books"/ 
>>>
>>> I can infer from this that maybe, the issue, is the fact that you are 
>>> trying to read a Hive internal file, which Pig 
>>> can't understand, leading to the error that you see. 
>>>
>>> Cheers, 
>>>
>>>
>>> On 4/14/14 1:23 AM, hanine haninne wrote: 
>>> > Hello , 
>>> > 
>>> > I m trying to store data in ES (head) using pig script and it gives me 
>>> > 
>>> > /Input(s):/ 
>>> > /Failed to read data from "/user/hive/warehouse/books"/ 
>>> > 
>>> > /Output(s):/ 
>>> > /Failed to produce result in "books/book"/ 
>>> > 
>>> > I ll be so thankful if someone would like help me 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to 
>>> > [email protected] <mailto:elasticsearch+
>>> [email protected]>. 
>>> > To view this discussion on the web visit 
>>> > https://groups.google.com/d/msgid/elasticsearch/979f5688-
>>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com 
>>> > <https://groups.google.com/d/msgid/elasticsearch/979f5688-
>>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
>>> email&utm_source=footer>. 
>>> > For more options, visit https://groups.google.com/d/optout. 
>>>
>>> -- 
>>> Costin 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/72350b3d-dc2f-4f5a-b066-cfd0b2241545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Hadoop] storing data in ES using pig script

Reply via email to