[
https://issues.apache.org/jira/browse/SPARK-22382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
DUC LIEM NGUYEN updated SPARK-22382:
------------------------------------
Description:
I've installed a system as followed:
--mesos master private IP of 10.x.x.2 , Public 35.x.x.6
--mesos slave private IP of 192.x.x.10, Public 111.x.x.2
Now the master assigned the task successfully to the slave, however, the task
failed. The error message is as followed:
Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask
timeout before connecting successfully
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply
in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
When I look at the environment, the spark.driver.host points to the private IP
address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look
at the Wireshark capture and indeed, there was failed TCP package to the master
private IP address.
Now if I set spark.driver.bindAddress from the master to its local IP address,
spark.driver.host from the master to its public IP address, I get the following
message.
ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable
to create executor due to Cannot assign requested address.
>From my understanding, the spark.driver.bindAddress set it for both master and
>slave, hence the slave get the said error. Now I'm really wondering how do I
>proper setup spark to work on this clustering over public IP?
> Spark on mesos: doesn't support public IP setup for agent and master.
> ----------------------------------------------------------------------
>
> Key: SPARK-22382
> URL: https://issues.apache.org/jira/browse/SPARK-22382
> Project: Spark
> Issue Type: Question
> Components: Mesos
> Affects Versions: 2.1.1
> Reporter: DUC LIEM NGUYEN
>
> I've installed a system as followed:
> --mesos master private IP of 10.x.x.2 , Public 35.x.x.6
> --mesos slave private IP of 192.x.x.10, Public 111.x.x.2
> Now the master assigned the task successfully to the slave, however, the task
> failed. The error message is as followed:
> Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask
> timeout before connecting successfully
> Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply
> in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
> When I look at the environment, the spark.driver.host points to the private
> IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I
> look at the Wireshark capture and indeed, there was failed TCP package to the
> master private IP address.
> Now if I set spark.driver.bindAddress from the master to its local IP
> address, spark.driver.host from the master to its public IP address, I get
> the following message.
> ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable
> to create executor due to Cannot assign requested address.
> From my understanding, the spark.driver.bindAddress set it for both master
> and slave, hence the slave get the said error. Now I'm really wondering how
> do I proper setup spark to work on this clustering over public IP?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]