[
https://issues.apache.org/jira/browse/SPARK-22342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429950#comment-16429950
]
Advertising
Susan X. Huynh commented on SPARK-22342:
----------------------------------------
Good news: I found the root cause of the multiple registration bug, and it is
not a Spark bug. It is caused by a bug in libmesos: "using a failoverTimeout of
0 with Mesos native scheduler client can result in infinite subscribe loop",
https://issues.apache.org/jira/browse/MESOS-8171 . This bug leads to the
multiple SUBSCRIBE calls seen in the driver logs. Upgrading the libmesos bundle
in my Docker image to a version with this patch fixed the issue. cc [~skonto]
> refactor schedulerDriver registration
> -------------------------------------
>
> Key: SPARK-22342
> URL: https://issues.apache.org/jira/browse/SPARK-22342
> Project: Spark
> Issue Type: Improvement
> Components: Mesos
> Affects Versions: 2.2.0
> Reporter: Stavros Kontopoulos
> Priority: Major
>
> This is an umbrella issue for working on:
> https://github.com/apache/spark/pull/13143
> and handle the multiple re-registration issue which invalidates an offer.
> To test:
> dcos spark run --verbose --name=spark-nohive --submit-args="--driver-cores
> 1 --conf spark.cores.max=1 --driver-memory 512M --class
> org.apache.spark.examples.SparkPi http://.../spark-examples_2.11-2.2.0.jar"
> master log:
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3085 hierarchical.cpp:303] Added framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3085 hierarchical.cpp:412] Deactivated framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3090 hierarchical.cpp:380] Activated framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for
> framework 'Spark Pi' at
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for
> framework 'Spark Pi' at
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for
> framework 'Spark Pi' at
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for
> framework 'Spark Pi' at
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:05.000000 3087 master.cpp:7662] Sending 6 offers to framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035
> I1020 13:49:05.000000 3087 master.cpp:9159] Removing offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034
> I1020 13:49:05.000000 3087 master.cpp:2894] Received SUBSCRIBE call for
> framework 'Spark Pi' at
> scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:05.000000 3087 master.cpp:2974] Subscribing framework Spark Pi
> with checkpointing disabled and capabilities [ ]
> I1020 13:49:05.000000 3087 master.cpp:6618] Updating info for framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003
> I1020 13:49:05.000000 3087 master.cpp:3083] Framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697 failed
> over
> I1020 13:49:06.000000 3084 master.cpp:7662] Sending 6 offers to framework
> 9764beab-c90a-4b4f-b0ff-44c187851b34-0004-driver-20171020134857-0003 (Spark
> Pi) at scheduler-73f79027-b262-40d2-b751-05d8a6b60146@10.0.2.97:40697
> I1020 13:49:06.000000 3089 http.cpp:1166] HTTP GET for /master/slaves from
> 10.0.4.84:37398 with User-Agent='Go-http-client/1.1'
> driver log:
> 17/10/20 13:49:07 INFO MesosCoarseGrainedSchedulerBackend: SchedulerBackend
> is ready for scheduling beginning after reached minRegisteredResourcesRatio:
> 0.0
> 17/10/20 13:49:07 DEBUG SparkContext: Adding shutdown hook
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S5. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10035 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S2. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10036 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S3. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10037 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S0. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10038 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S1. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Cannot launch a
> task for offer with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-O10039 on slave
> with id: 9764beab-c90a-4b4f-b0ff-44c187851b34-S6. Requirements were not met
> for this offer.
> 17/10/20 13:49:07 DEBUG MesosCoarseGrainedSchedulerBackend: Accepting offer:
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 with attributes: Map() allocation
> info: role: "*"
> ...
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 0 is
> now TASK_LOST
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has
> executorId:
> 17/10/20 13:49:08 INFO MesosCoarseGrainedSchedulerBackend: taskId has
> message:Task launched with invalid offers: Offer
> 9764beab-c90a-4b4f-b0ff-44c187851b34-O10034 is no longer valid
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org