[
https://issues.apache.org/jira/browse/SPARK-19936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907532#comment-15907532
]
Stan Teresen commented on SPARK-19936:
--------------------------------------
The cluster has plenty of the resource (3 r3.large AWS instances) and I ran
only one example on the whole cluster, and usually example runs normally (at
the same time I noticed that in most of those successful cases only one
executor is engaged). The problem occurs when 2 executors happen to start. Why
would those timeouts (you can see then in the log files 1.stderr, 2.stderr) be
happening?
> Page rank example takes long time to complete
> ---------------------------------------------
>
> Key: SPARK-19936
> URL: https://issues.apache.org/jira/browse/SPARK-19936
> Project: Spark
> Issue Type: Bug
> Components: Block Manager, Mesos
> Affects Versions: 2.1.0
> Environment: CentOS 7, Mesos 1.1.0
> Reporter: Stan Teresen
> Attachments: 1.stderr, 2.stderr, pr.out
>
>
> Sometimes Page Rank example takes very long time to finish on Mesos due to
> exceptions on fetching remote block in RetryingBlockFetcher on executor sides.
> As it is seen in the log files attached it took ~30 min for an example to
> complete in AWS 3 nodes environment (1 Mesos master and 2 agents).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]