Howdy Andrew, Here is what I ran before an application context was created (other services have been deleted): # netstat -l -t tcp -p --numeric-ports Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp6 0 0 10.90.17.100:8888 :::* LISTEN 4762/java tcp6 0 0 :::8081 :::* LISTEN 4762/java
And, then while the application context is up: # netstat -l -t tcp -p --numeric-ports Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp6 0 0 10.90.17.100:8888 :::* LISTEN 4762/java tcp6 0 0 :::57286 :::* LISTEN 3404/java tcp6 0 0 10.90.17.100:38118 :::* LISTEN 3404/java tcp6 0 0 10.90.17.100:35530 :::* LISTEN 3404/java tcp6 0 0 :::60235 :::* LISTEN 3404/java tcp6 0 0 :::8081 :::* LISTEN 4762/java My understanding is that this says four ports are open. Is 57286 and 60235 not being used? Jacob Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512) 286-6075 From: Andrew Ash <and...@andrewash.com> To: user@spark.apache.org Date: 05/25/2014 06:25 PM Subject: Re: Comprehensive Port Configuration reference? Hi Jacob, The config option spark.history.ui.port is new for 1.0 The problem that History server solves is that in non-Standalone cluster deployment modes (Mesos and YARN) there is no long-lived Spark Master that can store logs and statistics about an application after it finishes. History server is the UI that renders logged data from applications after they complete. Read more here: https://issues.apache.org/jira/browse/SPARK-1276 and https://github.com/apache/spark/pull/204 As far as the two vs four dynamic ports, are those all listening ports? I did observe 4 ports in use, but only two of them were listening. The other two were the random ports used for responses on outbound connections, the source port of the (srcIP, srcPort, dstIP, dstPort) tuple that uniquely identifies a TCP socket. http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to Thanks for taking a look through! I also realized that I had a couple mistakes with the 0.9 to 1.0 transition so appropriately documented those now as well in the updated PR. Cheers! Andrew On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <jeis...@us.ibm.com> wrote: Howdy Andrew, I noticed you have a configuration item that we were not aware of: spark.history.ui.port . Is that new for 1.0? Also, we noticed that the Workers and the Drivers were opening up four dynamic ports per application context. It looks like you were seeing two. Everything else looks like it aligns! Jacob Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512) 286-6075 Inactive hide details for Andrew Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in better understandingAndrew Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in better understanding what ports are used where From: Andrew Ash <and...@andrewash.com> To: user@spark.apache.org Date: 05/23/2014 10:30 AM Subject: Re: Comprehensive Port Configuration reference? Hi everyone, I've also been interested in better understanding what ports are used where and the direction the network connections go. I've observed a running cluster and read through code, and came up with the below documentation addition. https://github.com/apache/spark/pull/856 Scott and Jacob -- it sounds like you two have pulled together some of this yourselves for writing firewall rules. Would you mind taking a look at this pull request and confirming that it matches your observations? Wrong documentation is worse than no documentation, so I'd like to make sure this is right. Cheers, Andrew On Wed, May 7, 2014 at 10:19 AM, Mark Baker <dist...@acm.org> wrote: On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <jeis...@us.ibm.com> wrote: > In a nut shell, Spark opens up a couple of well known ports. And,then the workers and the shell open up dynamic ports for each job. These dynamic ports make securing the Spark network difficult. Indeed. Judging by the frequency with which this topic arises, this is a concern for many (myself included). I couldn't find anything in JIRA about it, but I'm curious to know whether the Spark team considers this a problem in need of a fix? Mark.