Eason Ye created FLINK-36371:
--------------------------------

             Summary: Field BinaryStringData as keySelector would lead the same 
records hash shuffle into different tumble windows
                 Key: FLINK-36371
                 URL: https://issues.apache.org/jira/browse/FLINK-36371
             Project: Flink
          Issue Type: Improvement
          Components: API / Type Serialization System
    Affects Versions: 2.0.0
            Reporter: Eason Ye


The pipeline is Source(Kafka serialize msg to RowData type) -> FlatMap 
--(keyby)–> TumbleProcessWindow -> sink.

The Source would emit two fields: UserName String, and Amount Integer.

FlatMap collects the BinaryRowData type record and emits it to the 
TumbleProcessWindow operator.

KeySelector is using the UserName BinaryStringData. key by code is : 
{code:java}
.keyBy(rowData -> rowData.getString(0)){code}
The window result is unexpected, the same username records arrived at 
TumbleProcessWindow simultaneously, but these records were calculated in the 
different windows.  

When I use the below keyBy, the window result is correct.
{code:java}
.keyBy(rowData -> rowData.getString(0).toString()){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to