I switched which machine was the master and which was the dedicated worker, and now it works just fine. I discovered machine2 is on my department's DMZ; machine1 is not. I suspect the departmental firewall was causing problems. By moving the master to machine2, that seems to have solved my problems.

Thank you all very much for your help. I'm sure I'll have other questions soon :)

Regards,
Shannon

On 6/27/14, 3:22 PM, Sujeet Varakhedi wrote:
Looks like your driver is not able to connect to the remote executor on machine2/130.49.226.148:60949 <http://130.49.226.148:60949/>. Cn you check if the master machine can route to 130.49.226.148

Sujeet


On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn <squ...@gatech.edu <mailto:squ...@gatech.edu>> wrote:

    For some reason, commenting out spark.driver.host and
    spark.driver.port fixed something...and broke something else (or
    at least revealed another problem). For reference, the only lines
    I have in my spark-defaults.conf now:

    spark.app.name <http://spark.app.name>          myProg
    spark.master            spark://192.168.1.101:5060
    <http://192.168.1.101:5060>
    spark.executor.memory   8g
    spark.files.overwrite   true

    It starts up, but has problems with machine2. For some reason,
    machine2 is having trouble communicating with *itself*. Here are
    the worker logs of one of the failures (there are 10 before it
    quits):


    Spark assembly has been built with Hive, including Datanucleus
    jars on classpath
    14/06/27 14:55:13 INFO ExecutorRunner: Launch command: "java"
    "-cp"
    
"::/home/spark/spark-1.0.0-bin-hadoop2/conf:/home/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/spark/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar"
    "-XX:MaxPermSize=128m" "-Xms8192M" "-Xmx8192M"
    "org.apache.spark.executor.CoarseGrainedExecutorBackend"
    "akka.tcp://spark@machine1:46378/user/CoarseGrainedScheduler" "7"
    "machine2" "8" "akka.tcp://sparkWorker@machine2:48019/user/Worker"
    "app-20140627144512-0001"
    14/06/27 14:56:54 INFO Worker: Executor app-20140627144512-0001/7
    finished with state FAILED message Command exited with code 1
    exitStatus 1
    14/06/27 14:56:54 INFO LocalActorRef: Message
    [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
    from Actor[akka://sparkWorker/deadLetters] to
    
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40130.49.226.148%3A53561-38#-1924573003]
    was not delivered. [10] dead letters encountered. This logging can
    be turned off or adjusted with configuration settings
    'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@machine2:48019] ->
    [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
    failed with [akka.tcp://sparkExecutor@machine2:60949]] [
    akka.remote.EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@machine2:60949]
    Caused by:
    akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
    Connection refused: machine2/130.49.226.148:60949
    <http://130.49.226.148:60949>
    ]
    14/06/27 14:56:54 INFO Worker: Asked to launch executor
    app-20140627144512-0001/8 for Funtown, USA
    14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@machine2:48019] ->
    [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
    failed with [akka.tcp://sparkExecutor@machine2:60949]] [
    akka.remote.EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@machine2:60949]
    Caused by:
    akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
    Connection refused: machine2/130.49.226.148:60949
    <http://130.49.226.148:60949>
    ]
    14/06/27 14:56:54 ERROR EndpointWriter: AssociationError
    [akka.tcp://sparkWorker@machine2:48019] ->
    [akka.tcp://sparkExecutor@machine2:60949]: Error [Association
    failed with [akka.tcp://sparkExecutor@machine2:60949]] [
    akka.remote.EndpointAssociationException: Association failed with
    [akka.tcp://sparkExecutor@machine2:60949]
    Caused by:
    akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
    Connection refused: machine2/130.49.226.148:60949
    <http://130.49.226.148:60949>
    ]

    Port 48019 on machine2 is indeed open, connected, and listening.
    Any ideas?

    Thanks!

    Shannon

    On 6/27/14, 1:54 AM, sujeetv wrote:

        Try to explicitly set set the "spark.driver.host" property to
        the master's
        IP.
        Sujeet



        --
        View this message in context:
        
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-network-configuration-problems-tp8304p8396.html
        Sent from the Apache Spark User List mailing list archive at
        Nabble.com.




Reply via email to