[ 
https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mohan updated PIG-5018:
-----------------------
     Patch Info: Patch Available
    Description: 

I am trying to write Hadoop Pig script which will take 2 files and filter based 
on string i.e

words.txt

google 
facebook 
twitter 
linkedin
tweets.json

{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about 
whether the americans wins a Ryder cup. I mean surely he has slightly more 
important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, 
"created_date": "Sun Sep 30 2012"}
SCRIPT

twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, 
text:chararray, user_id:chararray, id:chararray, created_date:chararray');
    filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
    extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, 
created_time, created_date, text;
    final = GROUP extracted BY pattern;
    dump final;
OUTPUT

(facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT 
@Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I 
mean surely he has slightly more important matters. #fami ...)})
the output that im getting is, without loading the words.txt file i.e by 
filtering the tweet directly.

I need to get the output as

(facebook)(complete tweet of that facebook word contained)
i.e it should read the words.txt and as words are reading according to that it 
should get all the tweets from tweets.json file

Any help

Mohan.V

  was:
up vote
0
down vote
favorite
I am trying to write Hadoop Pig script which will take 2 files and filter based 
on string i.e

words.txt

google 
facebook 
twitter 
linkedin
tweets.json

{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about 
whether the americans wins a Ryder cup. I mean surely he has slightly more 
important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, 
"created_date": "Sun Sep 30 2012"}
SCRIPT

twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, 
text:chararray, user_id:chararray, id:chararray, created_date:chararray');
    filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
    extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, 
created_time, created_date, text;
    final = GROUP extracted BY pattern;
    dump final;
OUTPUT

(facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT 
@Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I 
mean surely he has slightly more important matters. #fami ...)})
the output that im getting is, without loading the words.txt file i.e by 
filtering the tweet directly.

I need to get the output as

(facebook)(complete tweet of that facebook word contained)
i.e it should read the words.txt and as words are reading according to that it 
should get all the tweets from tweets.json file

Any help

Mohan.V


> Mohan.V
> -------
>
>                 Key: PIG-5018
>                 URL: https://issues.apache.org/jira/browse/PIG-5018
>             Project: Pig
>          Issue Type: Bug
>            Reporter: mohan
>
> I am trying to write Hadoop Pig script which will take 2 files and filter 
> based on string i.e
> words.txt
> google 
> facebook 
> twitter 
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook 
> about whether the americans wins a Ryder cup. I mean surely he has slightly 
> more important matters. #fami ...", "user_id": 450990391, "id": 
> 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, 
> text:chararray, user_id:chararray, id:chararray, created_date:chararray');
>     filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
>     extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, 
> created_time, created_date, text;
>     final = GROUP extracted BY pattern;
>     dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 
> 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a 
> Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by 
> filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that 
> it should get all the tweets from tweets.json file
> Any help
> Mohan.V



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to