[ 
https://issues.apache.org/jira/browse/FLINK-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351180#comment-15351180
 ] 

ASF GitHub Bot commented on FLINK-3713:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2083#discussion_r68590084
  
    --- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
    @@ -790,39 +767,95 @@ else if (result instanceof TriggerSavepointFailure) {
        }
     
        /**
    -    * Sends a {@link 
org.apache.flink.runtime.messages.JobManagerMessages.DisposeSavepoint}
    -    * message to the job manager.
    +    * Asks the JobManager to dispose a savepoint.
    +    *
    +    * <p>There are two options for this:
    +    * <ul>
    +    * <li>Either the job the savepoint belongs to is still running, in 
which
    +    * case the user code class loader of the job is used.</li>
    +    * <li>Or the job terminated, in which case the user JARs have to be
    +    * uploaded before disposing the savepoint.</li>
    +    * </ul>
         */
    -   private int disposeSavepoint(SavepointOptions options, String 
savepointPath) {
    -           try {
    -                   ActorGateway jobManager = getJobManagerGateway(options);
    -                   logAndSysout("Disposing savepoint '" + savepointPath + 
"'.");
    -                   Future<Object> response = jobManager.ask(new 
DisposeSavepoint(savepointPath), clientTimeout);
    +   private int disposeSavepoint(SavepointOptions options) throws Throwable 
{
    +           String savepointPath = 
Preconditions.checkNotNull(options.getSavepointPath(), "Savepoint path");
     
    -                   Object result;
    -                   try {
    -                           logAndSysout("Waiting for response...");
    -                           result = Await.result(response, clientTimeout);
    +           JobID jobId = options.getJobId();
    +           String jarFile = options.getJarFilePath();
    +
    +           if (jobId != null && jarFile != null) {
    +                   throw new IllegalArgumentException("Cannot dispose 
savepoint without Job ID or JAR.");
    +           }
    +
    +           ActorGateway jobManager = getJobManagerGateway(options);
    +
    +           final Future<Object> response;
    +           if (jobId != null) {
    +                   // Dispose with class loader of running job
    +                   logAndSysout("Disposing savepoint at '" + savepointPath 
+ "' of job " + jobId + " .");
    +
    +                   response = jobManager.ask(
    +                                   new 
JobManagerMessages.DisposeSavepoint(savepointPath, jobId),
    +                                   clientTimeout);
    +           } else if (jarFile != null) {
    +                   logAndSysout("Disposing savepoint at '" + savepointPath 
+ "'.");
    +
    +                   // Dispose with uploaded user code loader
    +                   Optimizer compiler = new Optimizer(new 
DataStatistics(), new DefaultCostEstimator(), config);
    +                   PackagedProgram program = new PackagedProgram(
    +                                   new File(jarFile),
    +                                   options.getClasspaths(),
    +                                   options.getEntryPointClass(),
    +                                   options.getProgramArgs());
    +                   FlinkPlan flinkPlan = Client.getOptimizedPlan(compiler, 
program, 1);
    +
    +                   JobGraph jobGraph;
    +                   if (flinkPlan instanceof StreamingPlan) {
    +                           jobGraph = ((StreamingPlan) 
flinkPlan).getJobGraph();
    +                   } else {
    +                           JobGraphGenerator gen = new 
JobGraphGenerator(this.config);
    +                           jobGraph = gen.compileJobGraph((OptimizedPlan) 
flinkPlan);
                        }
    -                   catch (Exception e) {
    -                           throw new Exception("Disposing the savepoint 
with path" + savepointPath + " failed.", e);
    +
    +                   for (URL jar : program.getAllLibraries()) {
    +                           try {
    +                                   jobGraph.addJar(new Path(jar.toURI()));
    +                           } catch (URISyntaxException e) {
    +                                   throw new RuntimeException("URL is 
invalid. This should not happen.", e);
    +                           }
                        }
     
    +                   jobGraph.setClasspaths(program.getClasspaths());
    +
    +                   logAndSysout("Uploading JAR files for savepoint 
disposal.");
    +                   JobClient.uploadJarFiles(jobGraph, jobManager, 
clientTimeout);
    +
    +                   response = jobManager.ask(
    +                                   new 
JobManagerMessages.DisposeSavepointWithClassLoader(
    +                                                   savepointPath,
    +                                                   
jobGraph.getUserJarBlobKeys(),
    +                                                   
jobGraph.getClasspaths()),
    +                                   clientTimeout);
    --- End diff --
    
    Let me check if it is sufficient, but the 
`PackagedProgram.extractContainedLibaries` calls looks good. 


> DisposeSavepoint message uses system classloader to discard state
> -----------------------------------------------------------------
>
>                 Key: FLINK-3713
>                 URL: https://issues.apache.org/jira/browse/FLINK-3713
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>            Reporter: Robert Metzger
>            Assignee: Ufuk Celebi
>
> The {{DisposeSavepoint}} message in the JobManager is using the system 
> classloader to discard the state:
> {code}
> val savepoint = savepointStore.getState(savepointPath)
> log.debug(s"$savepoint")
> // Discard the associated checkpoint
> savepoint.discard(getClass.getClassLoader)
> // Dispose the savepoint
> savepointStore.disposeState(savepointPath)
> {code}
> Which leads to issues when the state contains user classes:
> {code}
> 2016-04-07 03:02:12,225 INFO  org.apache.flink.yarn.YarnJobManager            
>               - Disposing savepoint at 
> 'hdfs:/bigdata/flink/savepoints/savepoint-7610540d089f'.
> 2016-04-07 03:02:12,233 WARN  
> org.apache.flink.runtime.checkpoint.StateForTask              - Failed to 
> discard checkpoint state: StateForTask eed5cc5a12dc2e0672848ba81bd8fa6d-0 : 
> SerializedValue
> java.lang.ClassNotFoundException: 
> <some_package>.MetricsProcessor$CombinedKeysFoldFunction
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:270)
>       at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
>       at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>       at java.util.HashMap.readObject(HashMap.java:1184)
>       at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
>       at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>       at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>       at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
>       at 
> org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
>       at 
> org.apache.flink.runtime.checkpoint.StateForTask.discard(StateForTask.java:109)
>       at 
> org.apache.flink.runtime.checkpoint.CompletedCheckpoint.discard(CompletedCheckpoint.java:85)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:635)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:627)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:627)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> The issue was reported by [~knaufk] and analyzed by [~aljoscha].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to