[
https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mohan updated PIG-5018:
-----------------------
Patch Info: Patch Available
Description:
I am trying to write Hadoop Pig script which will take 2 files and filter based
on string i.e
words.txt
google
facebook
twitter
linkedin
tweets.json
{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about
whether the americans wins a Ryder cup. I mean surely he has slightly more
important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616,
"created_date": "Sun Sep 30 2012"}
SCRIPT
twitter = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray,
text:chararray, user_id:chararray, id:chararray, created_date:chararray');
filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id,
created_time, created_date, text;
final = GROUP extracted BY pattern;
dump final;
OUTPUT
(facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT
@Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I
mean surely he has slightly more important matters. #fami ...)})
the output that im getting is, without loading the words.txt file i.e by
filtering the tweet directly.
I need to get the output as
(facebook)(complete tweet of that facebook word contained)
i.e it should read the words.txt and as words are reading according to that it
should get all the tweets from tweets.json file
Any help
Mohan.V
was:
up vote
0
down vote
favorite
I am trying to write Hadoop Pig script which will take 2 files and filter based
on string i.e
words.txt
google
facebook
twitter
linkedin
tweets.json
{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about
whether the americans wins a Ryder cup. I mean surely he has slightly more
important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616,
"created_date": "Sun Sep 30 2012"}
SCRIPT
twitter = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray,
text:chararray, user_id:chararray, id:chararray, created_date:chararray');
filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id,
created_time, created_date, text;
final = GROUP extracted BY pattern;
dump final;
OUTPUT
(facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT
@Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I
mean surely he has slightly more important matters. #fami ...)})
the output that im getting is, without loading the words.txt file i.e by
filtering the tweet directly.
I need to get the output as
(facebook)(complete tweet of that facebook word contained)
i.e it should read the words.txt and as words are reading according to that it
should get all the tweets from tweets.json file
Any help
Mohan.V
> Mohan.V
> -------
>
> Key: PIG-5018
> URL: https://issues.apache.org/jira/browse/PIG-5018
> Project: Pig
> Issue Type: Bug
> Reporter: mohan
>
> I am trying to write Hadoop Pig script which will take 2 files and filter
> based on string i.e
> words.txt
> google
> facebook
> twitter
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook
> about whether the americans wins a Ryder cup. I mean surely he has slightly
> more important matters. #fami ...", "user_id": 450990391, "id":
> 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray,
> text:chararray, user_id:chararray, id:chararray, created_date:chararray');
> filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
> extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id,
> created_time, created_date, text;
> final = GROUP extracted BY pattern;
> dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30
> 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a
> Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by
> filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that
> it should get all the tweets from tweets.json file
> Any help
> Mohan.V
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)