[ 
https://issues.apache.org/jira/browse/PIG-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4284:
----------------------------------
    Attachment: PIG-4284.patch

in PIG-4284.patch: following changes are made:
1. add IndexKey class in GlobalRearrangeConverter.java

following unit test failures are fixed in this patch:
org.apache.pig.test.TestJoin.testJoinWithMissingFieldsInTuples
org.apache.pig.test.TestJoin.testJoinNullTupleFieldKey
org.apache.pig.test.TestJoin.testDefaultJoin
org.apache.pig.test.TestJoin.testFullOuterJoin
org.apache.pig.test.TestJoin.testJoinTupleFieldKey
org.apache.pig.test.TestJoin.testLeftOuterJoin
org.apache.pig.test.TestJoin.testJoinSchema
org.apache.pig.test.TestJoin.testRightOuterJoin
org.apache.pig.test.TestProjectRange.testRangeCoGroupMixWSchema
org.apache.pig.test.TestProjectRange.testRangeJoinMixWSchema


Let's use an example to explain why these unit tests fail in previous code:
leftJoin.pig
{code}
a = load './a.txt' as (n:chararray, a:int);
b = load './b.txt' as (n:chararray, m:chararray);
c = join a by $0 left outer, b by $0;
d = order c by $1;
store d into './leftJoin.out';
explain d;
{code}

a.txt:
{code}
hello       1
bye         2
            3
{code}

b.txt:
{code}
hello   world
good    morning
        evening
{code}

Result of spark mode:
{code}
hello         1    hello        world
bye           2         
              3                 evening
{code}

Result of mr mode:
{code}
hello            1        hello        world
bye              2
                 3
{code}  
        
The difference between the result in mr and spark mode is because previously  
(,3) from table a  and (,evening) from table b are considered to have same 
key(NULL).  In SQL semantics, these two tuples don’t have the same key. This 
situation is dealed with in PIG-4284.patch.




> Enable unit test "TestJoin" for spark
> -------------------------------------
>
>                 Key: PIG-4284
>                 URL: https://issues.apache.org/jira/browse/PIG-4284
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4284.patch, TEST-org.apache.pig.test.TestJoin.txt
>
>
> error is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to