[ 
https://issues.apache.org/jira/browse/FLINK-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345353#comment-14345353
 ] 

Aljoscha Krettek commented on FLINK-1628:
-----------------------------------------

This is the fix, line 128 in AbstractJoinDescriptor:

{code}
@Override
        public boolean areCompatible(RequestedGlobalProperties requested1, 
RequestedGlobalProperties requested2,
                        GlobalProperties produced1, GlobalProperties produced2)
        {
                if (requested1.getPartitioning().isPartitionedOnKey() && 
requested2.getPartitioning().isPartitionedOnKey()) {
                        return produced1.getPartitioning() == 
produced2.getPartitioning() &&
                                        
produced1.getPartitioningFields().equals(produced2.getPartitioningFields()) &&
                                        (produced1.getCustomPartitioner() == 
null ? 
                                                
produced2.getCustomPartitioner() == null :
                                                
produced1.getCustomPartitioner().equals(produced2.getCustomPartitioner()));
                } else {
                        return true;
                }

        }
{code}

> Strange behavior of "where" function during a join
> --------------------------------------------------
>
>                 Key: FLINK-1628
>                 URL: https://issues.apache.org/jira/browse/FLINK-1628
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 0.9
>            Reporter: Daniel Bali
>              Labels: batch
>
> Hello!
> If I use the `where` function with a field list during a join, it exhibits 
> strange behavior.
> Here is the sample code that triggers the error: 
> https://gist.github.com/balidani/d9789b713e559d867d5c
> This example joins a DataSet with itself, then counts the number of rows. If 
> I use `.where(0, 1)` the result is (22), which is not correct. If I use 
> `EdgeKeySelector`, I get the correct result (101).
> When I pass a field list to the `equalTo` function (but not `where`), 
> everything works again.
> If I don't include the `groupBy` and `reduceGroup` parts, everything works.
> Also, when working with large DataSets, passing a field list to `where` makes 
> it incredibly slow, even though I don't see any exceptions in the log (in 
> DEBUG mode).
> Does anybody know what might cause this problem?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to