[ 
https://issues.apache.org/jira/browse/TAJO-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265776#comment-15265776
 ] 

ASF GitHub Bot commented on TAJO-2135:
--------------------------------------

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/1009

    TAJO-2135: Invalid join result when join key columns contain nulls

    First of all, apologize for a large patch.
    I added null values to every TPC-H tables in our test, and it makes a lot 
of changes in tests. However, fortunately, I could find several critical bugs 
with null values.
    Here are the changes of this patch.
    
    * Fixed invalid join result problem with null values.
     * When executing hash join, null values of the right table are filtered 
out while constructing a hash table.
    * Fixed invalid cast exception in outer joins with null values.
    * Fixed the problem of that sort-merge algorithm for full outer join is not 
executed in tests.
    * Fixed wrong result of full outer hash join with null values.
    * Fixed wrong result of RightOuterMergeJoinExec.
    * Fixed wrong query result on partitioned tables when partition keys 
contain nulls.
    * Changed the default null character of text file as '\N'.
    * Added null values to every TPC-H tables.
    
    I also found two additional bugs in anti join and window function 
executions with nulls. I think we need to fix these problems in other jiras.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-2135

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/1009.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1009
    
----
commit 6702654fef63a4862ae3894c30cf392609391cc1
Author: Jihoon Son <[email protected]>
Date:   2016-04-23T03:16:38Z

    Fix the bug and add an optimization rule

commit 9fe133c7e1b75c3bb3165f56efc52051460c22bd
Author: Jihoon Son <[email protected]>
Date:   2016-04-26T12:53:31Z

    Improve optimization rule to consider alias

commit b90ed663b0e261e5796893bf7ac057305a2daff6
Author: Jihoon Son <[email protected]>
Date:   2016-04-29T00:15:46Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-2135

commit 01a9762aecd07fbd49120fd8d645ce9fc5bc2195
Author: Jihoon Son <[email protected]>
Date:   2016-04-29T08:06:47Z

    - Added null values to all tpch tables.
    - Fixed join, sort tests.

commit 200113196cb31ff7e26ce92d60c3ed1149af012d
Author: Jihoon Son <[email protected]>
Date:   2016-04-29T11:07:21Z

    Fix group by test

commit 5e699f271d53b0e8673c2768494444c84c43e324
Author: Jihoon Son <[email protected]>
Date:   2016-04-30T05:39:53Z

    Fix all test failures

commit 0f4335deee5961ce4794cbdf27225dbca0d3251b
Author: Jihoon Son <[email protected]>
Date:   2016-04-30T05:55:48Z

    Code cleanup

commit 45432f25616093e368f45342272386b3f20eeb38
Author: Jihoon Son <[email protected]>
Date:   2016-05-01T05:33:48Z

    Fix invalid result in full outer join

commit 450bee298dba9c67a024dafb520ab9d7bde109ce
Author: Jihoon Son <[email protected]>
Date:   2016-05-01T10:48:30Z

    Fix full outer join bug

commit 44039abb4dea8e45d9098916f6e40b3a75f76771
Author: Jihoon Son <[email protected]>
Date:   2016-05-01T14:01:05Z

    Fixed all tests except window function

----


> Invalid join result when join key columns contain nulls
> -------------------------------------------------------
>
>                 Key: TAJO-2135
>                 URL: https://issues.apache.org/jira/browse/TAJO-2135
>             Project: Tajo
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>            Priority: Critical
>             Fix For: 0.12.0
>
>
> You can simply reproduce this bug as follows. The correct answer of the below 
> query is 20965674.
> {noformat}
> tpcds100> select count(*) from store_sales, store_returns 
> where 
> ss_customer_sk = sr_customer_sk 
> and ss_item_sk = sr_item_sk 
> ;
> [=========================================>] 100%  33.315 sec
> ?count
> -------------------------------
> 101145653
> (1 rows, 33.315 sec, 16 B selected)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to