Re: [Hadoop] storing data in ES using pig script

hanine haninne Mon, 14 Apr 2014 14:24:13 -0700

Hello ,
I used "elasticsearch-hadoop-1.3.0.M2"
and it given me

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201404142111_0008    weblog_count,weblog_group,weblogs
 GROUP_BY,COMBINER    Message: Job failed! Error - # of failed Reduce Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201404142111_0008_r_000000    weblogs/logs2,


Input(s):
Failed to read data from "/user/hive/warehouse/weblogs"

Output(s):
Failed to produce result in "weblogs/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0


I think it s better to know how things works from the beginning,so pls
would u like to tell what I have to do (what should I start with) what
should I do to configure elasticsearch (head) with Hadoop and how can I
work with elasticsearch head

Thank you so much ,really all what u say is so helpful .Thank you


2014-04-14 14:09 GMT+01:00 hanine haninne <[email protected]>:

> Ok ,thank you so much
>
>
> 2014-04-14 9:33 GMT+01:00 Costin Leau <[email protected]>:
>
> Since you are not specifying the network configuration for an
>> elasticsearch node, it will default to localhost:9200. This works as long
>> as you are running Hadoop (Pig, Hive, Cascading, etc...) on the same
>> machine as Elasticsearch - based on your exception that is unlikely the
>> case.
>> Try specifying the `es.nodes` parameter - see the documentation for more
>> information.
>>
>> Additionally, you seem to be using the wrong jar of es-hadoop - in your
>> script you are registering es-hadoop-1.2.0.jar (which does not support the
>> pig/hive/cascading functionality) while the stacktrace indicate you are
>> using es-hadoop-1.3.X.jar.
>>
>> Make sure you are using es-hadoop-1.3.0.M3.jar (which is released and
>> available in Maven Central) and no other version. I recommend starting with
>> the examples in the reference docs, which show to easily load and store
>> data to/from Elasticsearch.
>> Once that works, consider extending your script.
>>
>> Hope this helps,
>>
>>
>> On Mon, Apr 14, 2014 at 11:23 AM, hanine haninne <[email protected]>wrote:
>>
>>> Hi,
>>>
>>> Here is my log and my script Pig
>>>
>>> log file :
>>> Backend error message
>>> ---------------------
>>> java.io.IOException: java.io.IOException: Out of nodes and retries;
>>> caught exception
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
>>>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> Caused by: java.io.IOException: Out of nodes and retries; caught
>>> exception
>>>     at
>>> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:81)
>>>     at
>>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:221)
>>>     at
>>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:205)
>>>     at
>>> org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:209)
>>>     at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:103)
>>>     at
>>> org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:85)
>>>     at
>>> org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:60)
>>>     at
>>> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.init(EsOutputFormat.java:165)
>>>     at
>>> org.elasticsearch.hadoop.mr.EsOutputFormat$ESRecordWriter.write(EsOutputFormat.java:147)
>>>     at org.elasticsearch.hadoop.pig.EsStorage.putNext(EsStorage.java:188)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>>     at
>>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
>>>     at
>>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>>     at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:467)
>>>     ... 11 more
>>> Caused by: java.net.ConnectException: Connection refused
>>>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>     at
>>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>     at
>>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>     at
>>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>     at java.net.Socket.connect(Socket.java:579)
>>>     at java.net.Socket.connect(Socket.java:528)
>>>     at java.net.Socket.<init>(Socket.java:425)
>>>     at java.net.Socket.<init>(Socket.java:280)
>>>     at
>>> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:79)
>>>     at
>>> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:121)
>>>     at
>>> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:706)
>>>     at
>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:386)
>>>     at
>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
>>>     at
>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
>>>     at
>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
>>>     at
>>> org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:160)
>>>     at
>>> org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:74)
>>>     ... 25 more
>>>
>>> Pig script:
>>> REGISTER /home/hduser/hadoop/lib/elasticsearch-hadoop-1.2.0.jar;
>>> weblogs = LOAD '/user/hive/warehouse/weblogs' USING PigStorage('\t')
>>> AS (client_ip : chararray,
>>> full_request_date : chararray,
>>> day : int,
>>> month : chararray,
>>> month_num : int,
>>> year : int,
>>> hour : int,
>>> minute : int,
>>> second : int,
>>> timezone : chararray,
>>> http_verb : chararray,
>>> uri : chararray,
>>> http_status_code : chararray,
>>> bytes_returned : chararray,
>>> referrer : chararray,
>>> user_agent : chararray
>>> );
>>> weblog_group = GROUP weblogs by (client_ip, year, month_num);
>>> weblog_count = FOREACH weblog_group GENERATE group.client_ip,
>>> group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;
>>>
>>> STORE weblog_count INTO 'weblogs2/logs2' USING
>>> org.elasticsearch.hadoop.pig.EsStorage();
>>>
>>> And what ever I put in the LOAD it gives me the same result,even if I
>>> put the path of me desktop
>>>
>>> Thx
>>>
>>> Le lundi 14 avril 2014 03:11:23 UTC+1, Costin Leau a écrit :
>>>>
>>>> Hi,
>>>>
>>>> That isn't a lot of information so it's hard to figure out what's
>>>> actually work - one can only guess. Can you post your
>>>> stacktrace/logs and your script pig somewhere - like a gist?
>>>>
>>>> One thing that stands out is that you mention you are using Pig yet
>>>> your path points to a Hive warehouse:
>>>> > /Failed to read data from "/user/hive/warehouse/books"/
>>>>
>>>> I can infer from this that maybe, the issue, is the fact that you are
>>>> trying to read a Hive internal file, which Pig
>>>> can't understand, leading to the error that you see.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On 4/14/14 1:23 AM, hanine haninne wrote:
>>>> > Hello ,
>>>> >
>>>> > I m trying to store data in ES (head) using pig script and it gives
>>>> me
>>>> >
>>>> > /Input(s):/
>>>> > /Failed to read data from "/user/hive/warehouse/books"/
>>>> >
>>>> > /Output(s):/
>>>> > /Failed to produce result in "books/book"/
>>>> >
>>>> > I ll be so thankful if someone would like help me
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>> send an email to
>>>> > [email protected] <mailto:elasticsearch+
>>>> [email protected]>.
>>>> > To view this discussion on the web visit
>>>> > https://groups.google.com/d/msgid/elasticsearch/979f5688-
>>>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com
>>>> > <https://groups.google.com/d/msgid/elasticsearch/979f5688-
>>>> bd53-4b76-a97a-5b0359c8be75%40googlegroups.com?utm_medium=
>>>> email&utm_source=footer>.
>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> Costin
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e766ac9d-788e-4614-80ea-04960fdc257f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAJogdmdSoOPW9OZ9nf0D4-4m0sV9BCbx7YOH0Z3TVBtGCCi4mw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANXJSR9yKurOp37cr2OLPfF6%2BgE9K7A3DaTi0ecnVWiFJHK96w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Hadoop] storing data in ES using pig script

Reply via email to