[ 
https://issues.apache.org/jira/browse/FLINK-18770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168050#comment-17168050
 ] 

Leonid Ilyevsky commented on FLINK-18770:
-----------------------------------------

Thanks [~aljoscha].

I actually did some debugging sessions and I learned something useful.

First, now I understand that it is a legitimate error because some protobuf 
objects are flowing in there. I was confused by the exception stack, showing 
that it originated in the 'collect' call pushing just byte[]. I did not realize 
that the operators are chained so the call goes all the way to other steps 
where protobuf stuff is unmarshalled.

At the same time, I noticed that the serializer type is decided in  
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/OperatorChain.java]
 at line 486. So I realized that if I had object reuse enabled, it will not go 
there.

So I enabled object reuse and now my program works as expected.

I see in 
[https://ci.apache.org/projects/flink/flink-docs-stable/dev/execution_configuration.html]
 they warn about potential bugs, so I need to make sure my functions are aware 
of this. I guess, it just means any of my functions should never cache the 
input objects or pass them to the results. Is this correct?

 

Now you suggest I provide TypeInformation if I want to have object reuse 
disabled. I may work on this later.

Question to you: how critical is it to have object reuse disabled? Are there 
essential benefits, besides allowing my functions do be more relaxed? Or does 
it have essential overhead?

> Emitting element fails in KryoSerializer
> ----------------------------------------
>
>                 Key: FLINK-18770
>                 URL: https://issues.apache.org/jira/browse/FLINK-18770
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Type Serialization System
>    Affects Versions: 1.11.1
>         Environment: Flink 1.11.1, Linux
>            Reporter: Leonid Ilyevsky
>            Priority: Major
>         Attachments: AppMain.java, FlinkTest.scala, KryoException.txt, 
> SolaceSource.java, run_command.txt
>
>
> I wrote a simple Flink connector for Solace, see attached java file. It works 
> fine under local execution environment. However, when I deployed it in the 
> real Flink cluster, it failed with the Kryo exception, see attached.
> After a few hours of search and debugging, I can see now what is going on.
> The data I want to emit from this source is a simple byte array. In the 
> exception stack you can see that when I call 'collect' on the context, it 
> goes into OperatorChain.java:715, and then to KryoSerializer, where it 
> ultimately fails. I didn't have a chance to learn what KryoSerializer is and 
> why it would not know what to do with byte[], but that is not the point now.
> Then I used debugger in my local test, in order to figure out how it manages 
> to work. I saw that after OperatorChain.java:715 it goes into 
> BytePrimitiveArraySerializer, and then everything is working as expected. 
> Obviously BytePrimitiveArraySerializer makes sense for byte[] data.
> The question is, how can I configure the execution environment under cluster 
> so that it does serialization the same way as the local one? I looked at 
> [https://ci.apache.org/projects/flink/flink-docs-stable/dev/execution_configuration.html]
>  , and I was thinking of setting disableForceKryo, but it says it is disabled 
> by default anyway.
>  
> Another question is, why cluster execution environment has different default 
> settings compare to local? This makes it difficult to rely on local tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to