[
https://issues.apache.org/jira/browse/PIG-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281096#comment-15281096
]
liyunzhang_intel commented on PIG-4883:
---------------------------------------
[~rohini]: The problem is because for multiquery case, we will set the
mapKeyType of splitter as DataType.Tuple([detailed
code|https://github.com/apache/pig/blob/27b153dbd688d8328e00d2d4bead84f3c879b2ae/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java#L1035])
if the mapKeyType of splittees are not same.
so I think your code can not resolve current problem. Make an example to
explain it:
{noformat}
A(int) B(long) C(int)
D(long)
\ / \
/
Splitter(E) Splitter(F)
\ /
Splitter(G)
the mapKeyType of A and B are not same, mapKeyType of E will
be Tuple
the mapKeyType of C and D are not same, mapKeyType of F will
be Tuple
the mapKeyType of E and F are both DataType.Tuple and are
same. Function "hasSameMapKeyType" returns true.
{noformat}
In above case, we can not judge whether the mapKeyType of E and F are same if
we don't save the mapType of splittees for splitter. In PIG-4883.patch: we
create an attribute "mapKeyTypeForSplitter" in MapReduceOper to save the
mapKeyType of all splittees:
{noformat}
A(int) B(long) C(int)
D(long)
\ /
\ /
Splitter(E)
Splitter(F)
\ /
Splitter(G)
the mapKeyType of A and B are not same,mapKeyTypeForSplitter
of E will be [int,long]
the mapKeyType of C and D are not same, mapKeyTypeForSplitter
of F will be [int,long]
if all the elements in E.mapKeyTypeForSplitter and
F.mapKeyTypeForSplitter are same, function "hasSameMapKeyType" return true.
{noformat}
> MapKeyType of splitter was set wrongly in specific multiquery case
> ------------------------------------------------------------------
>
> Key: PIG-4883
> URL: https://issues.apache.org/jira/browse/PIG-4883
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.15.0
> Reporter: swlai
> Assignee: liyunzhang_intel
> Attachments: PIG-4883.patch
>
>
> The following code and data will raise an exception.
> However, if I remove any of the four "store"s, the code will be fine.
> {code:title=test.pig|borderStyle=solid}
> r = load 'test.txt' as (id: chararray, val: long);
> t1 = filter r by val >= 12 and val < 20;
> grpd = group t1 by val;
> t1_cnt = foreach grpd generate group as name, COUNT(t1) as value;
> t1_cnt = foreach t1_cnt generate (chararray)name, value;
> grpd = group t1 all;
> t1_cnt_total = foreach grpd generate 't1' as name, COUNT(t1) as value;
> t2 = filter r by val >= 20 and val < 30;
> grpd = group t2 by val;
> t2_cnt = foreach grpd generate group as name, COUNT(t2) as value;
> --t2_cnt = foreach t2_cnt generate (chararray)name, value;
> grpd = group t2 all;
> t2_cnt_total = foreach grpd generate 't2' as name, COUNT(t2) as value;
> store t1_cnt into 'outx/3';
> store t2_cnt into 'outx/4';
> store t1_cnt_total into 'outx/5';
> store t2_cnt_total into 'outx/6';
> {code}
> and
> {code:title=test.txt|borderStyle=solid}
> c 12
> {code}
> will cause error:
> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple
> cannot be cast to java.lang.Long
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNextString(POCast.java:1167)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:343)
> ... 14 more
> I don't know why the code need to cast a BinSedesTuple to Long.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)