[ 
https://issues.apache.org/jira/browse/PIG-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217855#comment-13217855
 ] 

Thejas M Nair commented on PIG-2534:
------------------------------------

+1 
I think we should add a javadoc comment to the setOutputUids function, saying 
that it checks for null schema and throws an exception to stop column pruning 
from happening. We should also consider moving such checks (such as null 
schema) that need to happen on all logical operators to a separate visitor 
(subclass of a AllLogicalExpressionVisitor?), which gets called from the 
Transformer.check().



                
> Pig generating infinite map outputs
> -----------------------------------
>
>                 Key: PIG-2534
>                 URL: https://issues.apache.org/jira/browse/PIG-2534
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>            Reporter: Vivek Padmanabhan
>         Attachments: PIG-2534-1.patch
>
>
> I am getting a strange behavior by Pig in the below script for Pig 0.9.
> {code}
> event_serve = LOAD 'input1'   AS (s, m, l);
> cm_data_raw = LOAD 'input2'  AS (s, m, l);
> SPLIT cm_data_raw INTO
>     cm_serve_raw IF (( (chararray) (s#'key1') == '0') AND ( (chararray) 
> (s#'key2') == '5')),
>     cm_click_raw IF (( (chararray) (s#'key1') == '1') AND ( (chararray) 
> (s#'key2') == '5'));
> cm_serve = FOREACH cm_serve_raw GENERATE  s#'key3' AS f1,  s#'key4' AS f2, 
> s#'key5' AS f3 ;
> cm_serve_lowercase = stream cm_serve through `echo val3`;
> cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 
> AS cm_receive_time, $2 AS cm_ctx_url;
> event_serve_filtered = FILTER event_serve BY  (chararray) (s#'key1') neq 
> 'xxx' AND (chararray) (s#'key2') neq 'yyy' ;
> event_serve_project = FOREACH event_serve_filtered GENERATE  s#'key3' AS 
> event_guid, s#'key4' AS receive_time;
> event_serve_join = join cm_serve_final by (cm_event_guid),
>     event_serve_project by (event_guid);
> store event_serve_join into 'somewhere';
> {code}
> Input (both input1 and input2 is same)
> ---
> [key1#0,key2#5,key3#val3,key4#val4,key5#val5]
> If i run this pig script with ColumnMapKeyPrune disabled, the job goes 
> through fine and 1 output is created.
> But if I run this script by default, then it keeps on generating map output 
> records infinitely. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to