Dear community,

Currently we are having problems with multiple users running paragraphs
associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job),
the active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method
is invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the
other users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of
cancellation of all jobs
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at
org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at
org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred
while calling o885.count.\n', JavaObject id=o886), <traceback object at
0x7f9e669ae588>)

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!

Reply via email to