[
https://issues.apache.org/jira/browse/FLINK-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590039#comment-14590039
]
Aljoscha Krettek commented on FLINK-2236:
-----------------------------------------
This additional test-case can provoke occurrence of the bug:
{code}
@Test
def testGroupedAggregateWithLongKeys(): Unit = {
// This uses very long keys to force serialized comparison. With short keys,
// the normalized key is sufficient.
val env = ExecutionEnvironment.getExecutionEnvironment
val ds = env.fromElements(
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2)
).rebalance().setParallelism(2).as('a, 'b, 'c)
.groupBy('a)
.select('b, 'c.sum)
ds.writeAsText(resultPath, WriteMode.OVERWRITE)
env.execute()
expected = "1,1\n" + "2,5\n" + "3,15\n" + "4,34\n" + "5,65\n" + "6,111\n"
}
{code}
> RowSerializer and CaseClassComparator are not in sync regarding Null-Values
> ---------------------------------------------------------------------------
>
> Key: FLINK-2236
> URL: https://issues.apache.org/jira/browse/FLINK-2236
> Project: Flink
> Issue Type: Bug
> Reporter: Aljoscha Krettek
>
> The RowSerializer was recently updated to allow it to handle null values.
> This changes the binary layout of the serialised data. CaseClassComparator,
> which is used for comparison, is not aware of this new layout and therefore
> fails. The problem only occurs when a key is long enough to exceed the
> normalised-key length, that's why the tests fail to notice the bug.
> I think the solution is to modify all Tuple-like serializers/comparators
> (TupleComparatorBase, CaseClassComparator, TupleSerializer,
> CaseClassSerializer, RowSerializer) to handle null-values, thus bringing the
> binary format in sync again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)