[ 
https://issues.apache.org/jira/browse/ARROW-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniël Heres updated ARROW-10971:
---------------------------------
    Description: 
Currently the left join generates a null for every row that is not present in 
the right batch.

However, this is wrong, as there should be no math in _all_ of the right 
batches.

The current implementation generates extra (left, none) tuples for every batch 
where the left side is not present. 

To fix it, we need to mark the keys or indexes on the left side as visited and 
traverse the unvisited items once at the end of the hash join.

  was:
Currently the left join generates a null for every row that is not present in 
the right batch.

However, this is wrong, as there should be no math in _all_ of the right 
batches.

The current implementation generates extra (left, none) tuples for every batch 
where the left side is not present. 

To fix it, we need to mark the keys or indexes on the left side as visited and 
traverse them once at the end of the hash join.


> [Rust][DataFusion] Left Join implementation is wrong for multiple batches on 
> right side
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-10971
>                 URL: https://issues.apache.org/jira/browse/ARROW-10971
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Daniël Heres
>            Priority: Major
>
> Currently the left join generates a null for every row that is not present in 
> the right batch.
> However, this is wrong, as there should be no math in _all_ of the right 
> batches.
> The current implementation generates extra (left, none) tuples for every 
> batch where the left side is not present. 
> To fix it, we need to mark the keys or indexes on the left side as visited 
> and traverse the unvisited items once at the end of the hash join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to