GC errors on using FILTER within nested FOREACH
-----------------------------------------------

                 Key: PIG-2610
                 URL: https://issues.apache.org/jira/browse/PIG-2610
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.1
            Reporter: Prashant Kommireddi


User has reported running into GC overhead errors while trying to use FILTER 
within FOREACH and aggregating the filtered field. Here is the sample PigLatin 
script provided by the user that generated this issue. 

{code}
raw = LOAD 'input' using MyCustomLoader();

searches = FOREACH raw GENERATE
               day, searchType,
               FLATTEN(impBag) AS (adType, clickCount)
           ;

groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
counts = FOREACH groupedSearches{
               type1 = FILTER searches BY adType == 'type1';
               type2 = FILTER searches BY adType == 'type2';
               GENERATE
                   FLATTEN(group) AS (day, searchType),
                   COUNT(searches) numSearches,
                   SUM(clickCount) AS clickCountPerSearchType,
                   SUM(type1.clickCount) AS type1ClickCount,
                   SUM(type2.clickCount) AS type2ClickCount;
       };
{code}

Pig should be able to handle this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to