[ 
https://issues.apache.org/jira/browse/FLINK-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428053#comment-15428053
 ] 

Ufuk Celebi commented on FLINK-4425:
------------------------------------

Thanks for sharing. I just looked into the code and saw that we fixed something 
since 1.1.1. There was a call to {{is.read}} instead of {{is.readFully}}. It 
could be that only parts of the stream are read into the serialized data and 
then set a wrong length value. I'm pretty sure that this is the problem and we 
can only fix this with 1.1.2.

It would help very much if you have some spare time to check out the 
`release-1.1` branch, build it from sources and then try to restore your 
original savepoint with it.

{code}
git clone https://github.com/apache/flink.git
cd flink
git checkout -b release-1.1 origin/release-1.1
mvn clean package -DskipTests
cd build-target
cp <your config>/flink-conf.yaml conf/
bin/start-cluster.sh
bin/flink run -s <original savepoint> ...
{code}


> "Out Of Memory" during savepoint deserialization
> ------------------------------------------------
>
>                 Key: FLINK-4425
>                 URL: https://issues.apache.org/jira/browse/FLINK-4425
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Sergii Koshel
>         Attachments: savepoint-c25e4b360a7d.zip
>
>
> I've created savepoint and trying to start job using it (via -s param) and 
> getting exception like below:
> {code:title=Exception|borderStyle=solid}
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointV1Serializer.deserialize(SavepointV1Serializer.java:167)
>         at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointV1Serializer.deserialize(SavepointV1Serializer.java:42)
>         at 
> org.apache.flink.runtime.checkpoint.savepoint.FsSavepointStore.loadSavepoint(FsSavepointStore.java:133)
>         at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.restoreSavepoint(SavepointCoordinator.java:201)
>         at 
> org.apache.flink.runtime.executiongraph.ExecutionGraph.restoreSavepoint(ExecutionGraph.java:983)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1302)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
>         at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1291)
>         at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>         at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>         at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>         at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> jobmanager.heap.mb: 1280
> taskmanager.heap.mb: 1024
> java 1.8
> savepoint + checkpoint size < 1 Mb in total



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to