[ 
https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307827#comment-15307827
 ] 

Aljoscha Krettek commented on BEAM-315:
---------------------------------------

I attached a version that uses a {{String}} as key. With this, the results are 
also wrong but "less wrong" than with the {{Key}} class. I think the problem 
with having {{Key}} as a key is that {{AvroCoder.consistentWithEquals()}} is 
{{false}} and the Flink runner uses the serialized bytes to do comparisons. Not 
sure how the Dataflow runner deals with this, though. Also, once data is 
sufficiently large for the bug to appear the pipeline can not be executed on 
the {{DirectPipelineRunner}} or the {{InProcessPipelineRunner}} because both 
fail with a OOM exception.

> GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner
> ------------------------------------------------------------------------
>
>                 Key: BEAM-315
>                 URL: https://issues.apache.org/jira/browse/BEAM-315
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 0.1.0-incubating
>            Reporter: Pawel Szczur
>         Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to