[
https://issues.apache.org/jira/browse/PIG-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4284:
----------------------------------
Attachment: PIG-4284.patch
in PIG-4284.patch: following changes are made:
1. add IndexKey class in GlobalRearrangeConverter.java
following unit test failures are fixed in this patch:
org.apache.pig.test.TestJoin.testJoinWithMissingFieldsInTuples
org.apache.pig.test.TestJoin.testJoinNullTupleFieldKey
org.apache.pig.test.TestJoin.testDefaultJoin
org.apache.pig.test.TestJoin.testFullOuterJoin
org.apache.pig.test.TestJoin.testJoinTupleFieldKey
org.apache.pig.test.TestJoin.testLeftOuterJoin
org.apache.pig.test.TestJoin.testJoinSchema
org.apache.pig.test.TestJoin.testRightOuterJoin
org.apache.pig.test.TestProjectRange.testRangeCoGroupMixWSchema
org.apache.pig.test.TestProjectRange.testRangeJoinMixWSchema
Let's use an example to explain why these unit tests fail in previous code:
leftJoin.pig
{code}
a = load './a.txt' as (n:chararray, a:int);
b = load './b.txt' as (n:chararray, m:chararray);
c = join a by $0 left outer, b by $0;
d = order c by $1;
store d into './leftJoin.out';
explain d;
{code}
a.txt:
{code}
hello 1
bye 2
3
{code}
b.txt:
{code}
hello world
good morning
evening
{code}
Result of spark mode:
{code}
hello 1 hello world
bye 2
3 evening
{code}
Result of mr mode:
{code}
hello 1 hello world
bye 2
3
{code}
The difference between the result in mr and spark mode is because previously
(,3) from table a and (,evening) from table b are considered to have same
key(NULL). In SQL semantics, these two tuples don’t have the same key. This
situation is dealed with in PIG-4284.patch.
> Enable unit test "TestJoin" for spark
> -------------------------------------
>
> Key: PIG-4284
> URL: https://issues.apache.org/jira/browse/PIG-4284
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4284.patch, TEST-org.apache.pig.test.TestJoin.txt
>
>
> error is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)