[ 
https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908738#comment-14908738
 ] 

Ben Duffield commented on SPARK-10635:
--------------------------------------

Ok good flag that there are other places this'd need to be considered.
How open would you be to a PR which addresses this? I.e. sure it's an 
assumption now - could we move away from that?

> pyspark - running on a different host
> -------------------------------------
>
>                 Key: SPARK-10635
>                 URL: https://issues.apache.org/jira/browse/SPARK-10635
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Ben Duffield
>
> At various points we assume we only ever talk to a driver on the same host.
> e.g. 
> https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L615
> We use pyspark to connect to an existing driver (i.e. do not let pyspark 
> launch the driver itself, but instead construct the SparkContext with the 
> gateway and jsc arguments.
> There are a few reasons for this, but essentially it's to allow more 
> flexibility when running in AWS.
> Before 1.3.1 we were able to monkeypatch around this:  
> {code}
> def _load_from_socket(port, serializer):
>             sock = socket.socket()
>             sock.settimeout(3)
>             try:
>                 sock.connect((host, port))
>                 rf = sock.makefile("rb", 65536)
>                 for item in serializer.load_stream(rf):
>                     yield item
>             finally:
>                 sock.close()
> pyspark.rdd._load_from_socket = _load_from_socket
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to