[
https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy resolved PIG-5018.
-------------------------------------
Resolution: Invalid
> Mohan.V
> -------
>
> Key: PIG-5018
> URL: https://issues.apache.org/jira/browse/PIG-5018
> Project: Pig
> Issue Type: Bug
> Reporter: mohan
>
> I am trying to write Hadoop Pig script which will take 2 files and filter
> based on string i.e
> words.txt
> google
> facebook
> twitter
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook
> about whether the americans wins a Ryder cup. I mean surely he has slightly
> more important matters. #fami ...", "user_id": 450990391, "id":
> 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray,
> text:chararray, user_id:chararray, id:chararray, created_date:chararray');
> filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
> extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id,
> created_time, created_date, text;
> final = GROUP extracted BY pattern;
> dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30
> 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a
> Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by
> filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that
> it should get all the tweets from tweets.json file
> Any help
> Mohan.V
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)