[ 
https://issues.apache.org/jira/browse/FLINK-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18629:
-------------------------------------
    Description: 
Following test fails:
{code}
        @Test
        public void testKeyedConnectedStreamsType() {
                StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

                DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
                DataStreamSource<Integer> stream2 = env.fromElements(1, 2);

                ConnectedStreams<Integer, Integer> connectedStreams = 
stream1.connect(stream2)
                        .keyBy(v -> v, v -> v);

                KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) 
connectedStreams.getFirstInput();
                KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) 
connectedStreams.getSecondInput();
                assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
                assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
        }
{code}

The problem is that the wildcard type is evaluated as {{Object}} for lambdas, 
which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector provided 
as lambda.

I suggest changing the method signature to:
{code}
        public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
                        KeySelector<IN1, K1> keySelector1,
                        KeySelector<IN2, K2> keySelector2)
{code}

This would be a code compatible change. Might break the compatibility of state 
backend (would change derived key type info). 

Still there would be a workaround to use the second method for old programs:
{code}
        public <KEY> ConnectedStreams<IN1, IN2> keyBy(
                        KeySelector<IN1, KEY> keySelector1,
                        KeySelector<IN2, KEY> keySelector2,
                        TypeInformation<KEY> keyType)
{code}

  was:
Following test fails:
{code}
        @Test
        public void testKeyedConnectedStreamsType() {
                StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

                DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
                DataStreamSource<Integer> stream2 = env.fromElements(1, 2);

                ConnectedStreams<Integer, Integer> connectedStreams = 
stream1.connect(stream2)
                        .keyBy(v -> v, v -> v);

                KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) 
connectedStreams.getFirstInput();
                KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) 
connectedStreams.getSecondInput();
                assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
                assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
        }
{code}

The problem is that the wildcard type is evaluated as {{Object}} for lambdas, 
which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector provided 
as lambda.

I suggest changing the method signature to:
{code}
        public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
                        KeySelector<IN1, K1> keySelector1,
                        KeySelector<IN2, K2> keySelector2)
{code}

This would be a code compatible change. Might break the compatibility of state 
backend (would change derived key type info). Nevertheless there is a 
workaround to use:

{code}
        public <KEY> ConnectedStreams<IN1, IN2> keyBy(
                        KeySelector<IN1, KEY> keySelector1,
                        KeySelector<IN2, KEY> keySelector2,
                        TypeInformation<KEY> keyType)
{code}


> ConnectedStreams#keyBy can not derive key TypeInformation for lambda 
> KeySelectors
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-18629
>                 URL: https://issues.apache.org/jira/browse/FLINK-18629
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.10.0, 1.11.0, 1.12.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Critical
>             Fix For: 1.10.2, 1.12.0, 1.11.2
>
>
> Following test fails:
> {code}
>       @Test
>       public void testKeyedConnectedStreamsType() {
>               StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>               DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
>               DataStreamSource<Integer> stream2 = env.fromElements(1, 2);
>               ConnectedStreams<Integer, Integer> connectedStreams = 
> stream1.connect(stream2)
>                       .keyBy(v -> v, v -> v);
>               KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) 
> connectedStreams.getFirstInput();
>               KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) 
> connectedStreams.getSecondInput();
>               assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
>               assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
>       }
> {code}
> The problem is that the wildcard type is evaluated as {{Object}} for lambdas, 
> which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector 
> provided as lambda.
> I suggest changing the method signature to:
> {code}
>       public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
>                       KeySelector<IN1, K1> keySelector1,
>                       KeySelector<IN2, K2> keySelector2)
> {code}
> This would be a code compatible change. Might break the compatibility of 
> state backend (would change derived key type info). 
> Still there would be a workaround to use the second method for old programs:
> {code}
>       public <KEY> ConnectedStreams<IN1, IN2> keyBy(
>                       KeySelector<IN1, KEY> keySelector1,
>                       KeySelector<IN2, KEY> keySelector2,
>                       TypeInformation<KEY> keyType)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to