[jira] [Updated] (FLINK-7002) Partitioning broken if enum is used in compound key specified using field expression

Sebastian Klemke (JIRA) Sun, 25 Jun 2017 13:39:21 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sebastian Klemke updated FLINK-7002:
------------------------------------
    Attachment: TestJob.java
                WorkingTestJob.java
                testdata.avro

> Partitioning broken if enum is used in compound key specified using field 
> expression
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-7002
>                 URL: https://issues.apache.org/jira/browse/FLINK-7002
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0, 1.3.1
>            Reporter: Sebastian Klemke
>         Attachments: testdata.avro, TestJob.java, WorkingTestJob.java
>
>
> When groupBy() or keyBy() is used with multiple field expressions, at least 
> one of them being an enum type serialized using EnumTypeInfo, partitioning 
> seems random, resulting in incorrectly grouped/keyed output 
> datasets/datastreams.
> The attached Flink DataSet API jobs and the test dataset detail the issue: 
> Both jobs count (id, type) occurrences, TestJob uses field expressions to 
> group, WorkingTestJob uses a KeySelector function.
> Expected output for both is 6 records, with frequency value 100_000 each. If 
> you run in LocalEnvironment, results are in fact equivalent. But when run on 
> a cluster with 5 TaskManagers, only KeySelector function with String key 
> produces correct results whereas field expressions produce random, 
> non-repeatable, wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7002) Partitioning broken if enum is used in compound key specified using field expression

Reply via email to