[ 
https://issues.apache.org/jira/browse/SPARK-19936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907532#comment-15907532
 ] 

Stan Teresen commented on SPARK-19936:
--------------------------------------

The cluster has plenty of the resource (3 r3.large AWS instances) and I ran 
only one example on the whole cluster, and usually example runs normally (at 
the same time I noticed that in most of those successful cases only one 
executor is engaged). The problem occurs when 2 executors happen to start. Why 
would those timeouts (you can see then in the log files 1.stderr, 2.stderr) be 
happening?

> Page rank example takes long time to complete
> ---------------------------------------------
>
>                 Key: SPARK-19936
>                 URL: https://issues.apache.org/jira/browse/SPARK-19936
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Mesos
>    Affects Versions: 2.1.0
>         Environment: CentOS 7, Mesos 1.1.0
>            Reporter: Stan Teresen
>         Attachments: 1.stderr, 2.stderr, pr.out
>
>
> Sometimes Page Rank example takes very long time to finish on Mesos due to 
> exceptions on fetching remote block in RetryingBlockFetcher on executor sides.
> As it is seen in the log files attached it took ~30 min for an example to 
> complete in AWS 3 nodes environment (1 Mesos master and 2 agents).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to