[
https://issues.apache.org/jira/browse/PIG-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248136#comment-13248136
]
Prashant Kommireddi commented on PIG-2610:
------------------------------------------
How is this case different (from Pig Latin basics page)?
{code}
A = LOAD 'data' AS (url:chararray,outlink:chararray);
DUMP A;
(www.ccc.com,www.hjk.com)
(www.ddd.com,www.xyz.org)
(www.aaa.com,www.cvn.org)
(www.www.com,www.kpt.net)
(www.www.com,www.xyz.org)
(www.ddd.com,www.xyz.org)
B = GROUP A BY url;
DUMP B;
(www.aaa.com,{(www.aaa.com,www.cvn.org)})
(www.ccc.com,{(www.ccc.com,www.hjk.com)})
(www.ddd.com,{(www.ddd.com,www.xyz.org),(www.ddd.com,www.xyz.org)})
(www.www.com,{(www.www.com,www.kpt.net),(www.www.com,www.xyz.org)})
X = FOREACH B {
FA= FILTER A BY outlink == 'www.xyz.org';
PA = FA.outlink;
DA = DISTINCT PA;
GENERATE group, COUNT(DA);
}
DUMP X;
(www.aaa.com,0)
(www.ccc.com,0)
(www.ddd.com,1)
(www.www.com,1)
{code}
> GC errors on using FILTER within nested FOREACH
> -----------------------------------------------
>
> Key: PIG-2610
> URL: https://issues.apache.org/jira/browse/PIG-2610
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.1
> Reporter: Prashant Kommireddi
>
> User has reported running into GC overhead errors while trying to use FILTER
> within FOREACH and aggregating the filtered field. Here is the sample
> PigLatin script provided by the user that generated this issue.
> {code}
> raw = LOAD 'input' using MyCustomLoader();
> searches = FOREACH raw GENERATE
> day, searchType,
> FLATTEN(impBag) AS (adType, clickCount)
> ;
> groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
> type1 = FILTER searches BY adType == 'type1';
> type2 = FILTER searches BY adType == 'type2';
> GENERATE
> FLATTEN(group) AS (day, searchType),
> COUNT(searches) numSearches,
> SUM(clickCount) AS clickCountPerSearchType,
> SUM(type1.clickCount) AS type1ClickCount,
> SUM(type2.clickCount) AS type2ClickCount;
> };
> {code}
> Pig should be able to handle this case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira