[
https://issues.apache.org/jira/browse/FLINK-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812967#comment-16812967
]
yankai zhang edited comment on FLINK-12113 at 4/9/19 3:20 AM:
--------------------------------------------------------------
Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous
class.
Problem here is anonymous class object in instance method implicitly references
outer _this_(but not actually used), while outer _this_ is not serializable,
and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do.
In fact, the iterator passed by user is wrapped within a
_FromIteratorFunction_, and then _StreamExecutionEnvironment#clean_ is called
on that wrapper _ _instance, not the iterator itself. However current
implementation of _StreamExecutionEnvironment#clean_ is not recursive, it can't
find and clean _this_ deeply nested in closure.
Here is my fully reproducible code:
{code:java}
public class MainTest {
interface IS<E> extends Iterator<E>, Serializable {
}
@Test
public void cleanTest() {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.fromCollection(new IS<Object>() {
@Override
public boolean hasNext() {
return false;
}
@Override
public Object next() {
return null;
}
}, Object.class);
}
}{code}
was (Author: vision57):
Yes, _fromCollection(Iterator, Class)_ works well as expected without anonymous
class.
Problem here is anonymous class object in instance method implicitly references
outer _this_(but not actually used), while outer _this_ is not serializable,
and this is exactly what _StreamExecutionEnvironment#clean_ supposed to do.
In act, the iterator passed by user is wrapped within a _FromIteratorFunction_,
and then _StreamExecutionEnvironment#clean_ is called on that wrapper __
instance, not the iterator itself. However current implementation of
_StreamExecutionEnvironment#clean_ is not recursive, it can't find and clean
_this_ deeply nested in closure.
Here is my fully reproducible code:
{code:java}
public class MainTest {
interface IS<E> extends Iterator<E>, Serializable {
}
@Test
public void cleanTest() {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.fromCollection(new IS<Object>() {
@Override
public boolean hasNext() {
return false;
}
@Override
public Object next() {
return null;
}
}, Object.class);
}
}{code}
> User code passing to fromCollection(Iterator, Class) not cleaned
> ----------------------------------------------------------------
>
> Key: FLINK-12113
> URL: https://issues.apache.org/jira/browse/FLINK-12113
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Affects Versions: 1.7.2
> Reporter: yankai zhang
> Priority: Major
> Attachments: image-2019-04-07-21-52-37-264.png,
> image-2019-04-08-23-19-27-359.png
>
>
>
> {code:java}
> interface IS<E> extends Iterator<E>, Serializable { }
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromCollection(new IS<Object>() {
> @Override
> public boolean hasNext() {
> return false;
> }
> @Override
> public Object next() {
> return null;
> }
> }, Object.class);
> {code}
> Code piece above throws exception:
> {code:java}
> org.apache.flink.api.common.InvalidProgramException: The implementation of
> the SourceFunction is not serializable. The object probably contains or
> references non serializable fields.
> at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99)
> ....{code}
> And my workaround is wrapping clean around iterator instance, like this:
>
> {code:java}
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.fromCollection(env.clean(new IS<Object>() {
> @Override
> public boolean hasNext() {
> return false;
> }
> @Override
> public Object next() {
> return null;
> }
> }), Object.class);
> {code}
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)