[ 
https://issues.apache.org/jira/browse/FLINK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988543#comment-16988543
 ] 

Piotr Nowojski commented on FLINK-14973:
----------------------------------------

I this issue could be workaround now, by loading the azure and oss file systems 
(or at least one of them) [as 
plugins|https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#pluggable-file-systems].
 Because of that, unless there is some particular issue with those file systems 
that they can not work as plugins, I would be inclined to close this as won't 
fix. Note also that as we gain coincidence that some particular Filesystem is 
working as plugin, we will be removing shading from it anyway. 

If someone will be trying to out to solve this by using plugins, please note 
that plugins handling in yarn clusters [is/will be fixed in 1.9.2 or 1.10.0 
release.|
https://issues.apache.org/jira/browse/FLINK-14382]

> OSS filesystem does not relocate many dependencies
> --------------------------------------------------
>
>                 Key: FLINK-14973
>                 URL: https://issues.apache.org/jira/browse/FLINK-14973
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.9.1
>            Reporter: Patrick Lucas
>            Priority: Major
>
> Whereas the Azure and S3 Hadoop filesystem jars relocate all of their 
> depdendencies:
> {noformat}
> $ jar tf opt/flink-azure-fs-hadoop-1.9.1.jar | grep -v 
> '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | 
> sort | uniq
> org/
> org/apache
> $ jar tf opt/flink-s3-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' 
> | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
> org/
> org/apache
> {noformat}
> The OSS Hadoop filesystem leaves many things un-relocated:
> {noformat}
> $ jar tf opt/flink-oss-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' 
> | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
> assets/
> assets/org
> avro/
> avro/shaded
> com/
> com/ctc
> com/fasterxml
> com/google
> com/jcraft
> com/nimbusds
> com/sun
> com/thoughtworks
> javax/
> javax/activation
> javax/el
> javax/servlet
> javax/ws
> javax/xml
> jersey/
> jersey/repackaged
> licenses/
> licenses/LICENSE.asm
> licenses/LICENSE.cddlv1.0
> licenses/LICENSE.cddlv1.1
> licenses/LICENSE.jdom
> licenses/LICENSE.jzlib
> licenses/LICENSE.paranamer
> licenses/LICENSE.protobuf
> licenses/LICENSE.re2j
> licenses/LICENSE.stax2api
> net/
> net/jcip
> net/minidev
> org/
> org/apache
> org/codehaus
> org/eclipse
> org/jdom
> org/objectweb
> org/tukaani
> org/xerial
> {noformat}
> The first symptom of this I ran into was that Flink is unable to restore from 
> a savepoint if both the OSS and Azure Hadoop filesystems are on the 
> classpath, but I assume this has the potential to cause further problems, at 
> least until more progress is made on the module/classloading front.
> h3. Steps to reproduce
> # Copy both the Azure and OSS Hadoop filesystem JARs from opt/ into lib/
> # Run a job that restores from a savepoint (the savepoint might need to be 
> stored on OSS)
> # See a crash and traceback like:
> {noformat}
> 2019-11-26 15:59:25,318 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
> occurred in the cluster entrypoint.
> org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take 
> leadership with session id 00000000-0000-0000-0000-000000000000.
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>  ~[?:1.8.0_232]
>       at 
> org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:691)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) 
> ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753)
>  ~[?:1.8.0_232]
>       at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
>  ~[?:1.8.0_232]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.actor.Actor.aroundReceive(Actor.scala:517) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.actor.Actor.aroundReceive$(Actor.scala:515) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
> [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>  [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>       at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>  ~[?:1.8.0_232]
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
> ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       ... 4 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not 
> set up JobManager
>       at 
> org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>  ~[?:1.8.0_232]
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
> ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       ... 4 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
>       at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.open(AliyunOSSFileSystem.java:570)
>  ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>  ~[flink-azure-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:120)
>  ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:37)
>  ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:141)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1132)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.scheduler.LegacyScheduler.tryRestoreExecutionGraphFromSavepoint(LegacyScheduler.java:237)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:196)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265) 
> ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>  ~[?:1.8.0_232]
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 
> ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
>  ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
>       ... 4 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to