Fabian Hueske created FLINK-4640:
------------------------------------
Summary: Serialization of the initialValue of a Fold on
WindowedStream fails
Key: FLINK-4640
URL: https://issues.apache.org/jira/browse/FLINK-4640
Project: Flink
Issue Type: Bug
Components: Streaming
Affects Versions: 1.1.2, 1.2.0
Reporter: Fabian Hueske
Priority: Critical
Fix For: 1.2.0, 1.1.3
The following program
{code}
DataStream<Tuple2<String, Long>> src = env.fromElements(new Tuple2<String,
Long>("a", 1L));
src
.keyBy(1)
.timeWindow(Time.minutes(5))
.fold(TreeMultimap.<Long, String>create(), new FoldFunction<Tuple2<String,
Long>, TreeMultimap<Long, String>>() {
@Override
public TreeMultimap<Long, String> fold(
TreeMultimap<Long, String> topKSoFar,
Tuple2<String, Long> itemCount) throws Exception
{
String item = itemCount.f0;
Long count = itemCount.f1;
topKSoFar.put(count, item);
if (topKSoFar.keySet().size() > 10) {
topKSoFar.removeAll(topKSoFar.keySet().first());
}
return topKSoFar;
}
});
{code}
throws this exception
{quote}
Caused by: java.lang.RuntimeException: Could not add value to folding state.
at
org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:91)
at
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:382)
at
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:192)
at
com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:121)
at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:78)
at
org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:115)
at
org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:109)
at
org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:85)
... 6 more
{quote}
Using the same {{FoldFunction}} on a {{KeyedStream}} (without a window) works
fine.
I tracked the problem down to the serialization of the {{StateDescriptor}},
i.e., the {{writeObject()}} and {{readObject()}} methods. The methods use
Flink's TypeSerializers to serialize the default value. In case of the
{{TreeMultiMap}} this is the {{KryoSerializer}} which fails to read the
serialized data for some reason.
A quick workaround to solve this issue would be to check if the default value
implements {{Serializable}} and use Java Serialization in this case. However,
it would be good to track the root cause of this problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)