Sebastian Klemke created FLINK-7002:
---------------------------------------
Summary: Partitioning broken if enum is used in compound key
specified using field expression
Key: FLINK-7002
URL: https://issues.apache.org/jira/browse/FLINK-7002
Project: Flink
Issue Type: Bug
Components: Core
Affects Versions: 1.3.1, 1.2.0
Reporter: Sebastian Klemke
When groupBy() or keyBy() is used with multiple field expressions, at least one
of them being an enum type serialized using EnumTypeInfo, partitioning seems
random, resulting in incorrectly grouped/keyed output datasets/datastreams.
The attached Flink DataSet API jobs and the test dataset detail the issue: Both
jobs count (id, type) occurrences, TestJob uses field expressions to group,
WorkingTestJob uses a KeySelector function.
Expected output for both is 6 records, with frequency value 100_000 each. If
you run in LocalEnvironment, results are in fact equivalent. But when run on a
cluster with 5 TaskManagers, only KeySelector function with String key produces
correct results whereas field expressions produce random, non-repeatable, wrong
results.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)