[
https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456449#comment-15456449
]
Rohini Palaniswamy commented on PIG-5018:
-----------------------------------------
Bulk Deleted all issues from PIG-4977 to PIG-5017.
[~Mohan.V],
Can you tell us how you ended up splitting the text and creating multiple
jiras instead of one? We will see if we can block that and avoid this from
happening again in future.
> Mohan.V
> -------
>
> Key: PIG-5018
> URL: https://issues.apache.org/jira/browse/PIG-5018
> Project: Pig
> Issue Type: Bug
> Reporter: mohan
>
> I am trying to write Hadoop Pig script which will take 2 files and filter
> based on string i.e
> words.txt
> google
> facebook
> twitter
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook
> about whether the americans wins a Ryder cup. I mean surely he has slightly
> more important matters. #fami ...", "user_id": 450990391, "id":
> 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray,
> text:chararray, user_id:chararray, id:chararray, created_date:chararray');
> filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
> extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id,
> created_time, created_date, text;
> final = GROUP extracted BY pattern;
> dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30
> 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a
> Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by
> filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that
> it should get all the tweets from tweets.json file
> Any help
> Mohan.V
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)