[ 
https://issues.apache.org/jira/browse/FLINK-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590039#comment-14590039
 ] 

Aljoscha Krettek commented on FLINK-2236:
-----------------------------------------

This additional test-case can provoke occurrence of the bug:

{code}
@Test
  def testGroupedAggregateWithLongKeys(): Unit = {

    // This uses very long keys to force serialized comparison. With short keys,
    // the normalized key is sufficient.


    val env = ExecutionEnvironment.getExecutionEnvironment
    val ds = env.fromElements(
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhaa", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2),
      ("hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhab", 1, 2)
    ).rebalance().setParallelism(2).as('a, 'b, 'c)
      .groupBy('a)
      .select('b, 'c.sum)

    ds.writeAsText(resultPath, WriteMode.OVERWRITE)
    env.execute()
    expected = "1,1\n" + "2,5\n" + "3,15\n" + "4,34\n" + "5,65\n" + "6,111\n"
  }
{code}

> RowSerializer and CaseClassComparator are not in sync regarding Null-Values
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-2236
>                 URL: https://issues.apache.org/jira/browse/FLINK-2236
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Aljoscha Krettek
>
> The RowSerializer was recently updated to allow it to handle null values. 
> This changes the binary layout of the serialised data. CaseClassComparator, 
> which is used for comparison, is not aware of this new layout and therefore 
> fails. The problem only occurs when a key is long enough to exceed the 
> normalised-key length, that's why the tests fail to notice the bug.
> I think the solution is to modify all Tuple-like serializers/comparators 
> (TupleComparatorBase, CaseClassComparator, TupleSerializer, 
> CaseClassSerializer, RowSerializer) to handle null-values, thus bringing the 
> binary format in sync again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to