[jira] [Commented] (PIG-4883) MapKeyType of splitter was set wrongly in specific multiquery case

liyunzhang_intel (JIRA) Wed, 11 May 2016 18:43:37 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281096#comment-15281096
 ]


liyunzhang_intel commented on PIG-4883:
---------------------------------------

[~rohini]: The problem is because  for multiquery case,  we will set the 
mapKeyType of splitter as DataType.Tuple([detailed 
code|https://github.com/apache/pig/blob/27b153dbd688d8328e00d2d4bead84f3c879b2ae/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java#L1035])
 if the mapKeyType of splittees are not same. 

so I think your code can not resolve current problem.   Make an example to 
explain it:
{noformat}
                      A(int)        B(long)              C(int)              
D(long)
                            \      /                                  \         
   /
                           Splitter(E)                           Splitter(F)
                                      \                                   /
                                                    Splitter(G)
                 the mapKeyType of A and B are not same, mapKeyType of E will 
be Tuple
                 the mapKeyType of C and D are not same, mapKeyType of F will 
be Tuple
                 the mapKeyType of E and F are both DataType.Tuple and are 
same.  Function "hasSameMapKeyType" returns true. 
{noformat}

In above case, we can not judge whether the mapKeyType of E and F are same if 
we don't save the mapType of splittees for splitter.  In PIG-4883.patch: we 
create an attribute "mapKeyTypeForSplitter" in MapReduceOper to save the 
mapKeyType of all splittees:
{noformat}
                      A(int)        B(long)                     C(int)          
 D(long)
                            \      /                                           
\     /
                           Splitter(E)                                  
Splitter(F)
                                      \                                        /
                                                    Splitter(G)
                 the mapKeyType of A and B are not same,mapKeyTypeForSplitter 
of E will be [int,long]
                 the mapKeyType of C and D are not same, mapKeyTypeForSplitter 
of F will be [int,long]
                 if all the elements in E.mapKeyTypeForSplitter and 
F.mapKeyTypeForSplitter are same, function "hasSameMapKeyType" return true.
{noformat}



> MapKeyType of splitter was set wrongly in specific multiquery case
> ------------------------------------------------------------------
>
>                 Key: PIG-4883
>                 URL: https://issues.apache.org/jira/browse/PIG-4883
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>            Reporter: swlai
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4883.patch
>
>
> The following code and data will raise an exception.
> However, if I remove any of the four "store"s, the code will be fine.
> {code:title=test.pig|borderStyle=solid}
> r = load 'test.txt' as (id: chararray, val: long);
> t1 = filter r by val >= 12 and val < 20;
> grpd = group t1 by val;
> t1_cnt = foreach grpd generate group as name, COUNT(t1) as value;
> t1_cnt = foreach t1_cnt generate (chararray)name, value;
> grpd = group t1 all;
> t1_cnt_total = foreach grpd generate 't1' as name, COUNT(t1) as value; 
> t2 = filter r by val >= 20 and val < 30;
> grpd = group t2 by val;
> t2_cnt = foreach grpd generate group as name, COUNT(t2) as value;
> --t2_cnt = foreach t2_cnt generate (chararray)name, value;
> grpd = group t2 all;
> t2_cnt_total = foreach grpd generate 't2' as name, COUNT(t2) as value;
> store t1_cnt  into 'outx/3';
> store t2_cnt  into 'outx/4';
> store t1_cnt_total into 'outx/5';
> store t2_cnt_total into 'outx/6';
> {code}
> and
> {code:title=test.txt|borderStyle=solid}
> c     12
> {code}
> will cause error:
> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple 
> cannot be cast to java.lang.Long
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNextString(POCast.java:1167)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:343)
>         ... 14 more
> I don't know why the code need to cast a BinSedesTuple to Long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4883) MapKeyType of splitter was set wrongly in specific multiquery case

Reply via email to