[ 
https://issues.apache.org/jira/browse/PIG-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510094#comment-14510094
 ] 

Rohini Palaniswamy commented on PIG-4515:
-----------------------------------------

If it is a SingleTupleBag why not use it as it is instead of extracting and 
wrapping it again? Distinct already seems to have lot of wrapping of Tuple into 
bag and unwrapping which seems to be a overkill. Can you see if that can also 
be simplified?

On a different note, you can do one of the following to have your current logic 
working without this fix.
    - do DISTINCT of A before doing GROUP BY in your script. It will be more 
efficient
   - or do C = FOREACH B GENERATE Distinct(A.(a,b));
   - or do C =  FOREACH (GROUP A BY a) {
        B = DISTINCT a;
        GENERATE B;
    };

> org.apache.pig.builtin.Distinct throws ClassCastException
> ---------------------------------------------------------
>
>                 Key: PIG-4515
>                 URL: https://issues.apache.org/jira/browse/PIG-4515
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>         Environment: 2015-04-23 08:37:49,117 [main] INFO  org.apache.pig.Main 
> - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
>            Reporter: Mikko Kupsu
>         Attachments: fix_singletuplebag_classcast_exception.patch
>
>
> Running below script causes *ClassCastException*.
> {code}
> A = LOAD 'A' AS (a:int, b:int);
> B = GROUP A BY a;
> C = FOREACH B GENERATE Distinct(A);
> DUMP C;
> {code}
> Content of A:
> {code}
> 1     1
> 2     1
> 3     1
> 4     1
> 5     2
> 6     2
> 7     2
> 8     2
> 9     2
> {code}
> {code}
> Caused by: java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag 
> cannot be cast to org.apache.pig.data.Tuple
>       at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:86)
>       at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:78)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:323)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:362)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to