[
https://issues.apache.org/jira/browse/SPARK-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388680#comment-14388680
]
Eric O. LEBIGOT (EOL) commented on SPARK-5004:
----------------------------------------------
Since this bug is unfortunately still here, I am adding some potentially useful
information:
In a Python shell, _starting_ with network settings with _no proxy_ allows
Spark to start (that's the normal behavior).
Now, the new information is that switching back to network settings _with a
proxy_ lets Spark run normally.
Thus, the proxy problem (likely Spark trying to access localhost through the
proxy even if localhost is in the list of exceptions) seems to only appear at
startup, as far as I can see.
> PySpark does not handle SOCKS proxy
> -----------------------------------
>
> Key: SPARK-5004
> URL: https://issues.apache.org/jira/browse/SPARK-5004
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Eric O. LEBIGOT (EOL)
>
> PySpark cannot run even the quick start examples when a SOCKS proxy is used.
> Turning off the SOCKS proxy makes PySpark work.
> The Scala-shell version is not affected and works even when a SOCKS proxy is
> used.
> Is there a quick workaround, while waiting for this to be fixed?
> Here is the error message (printed, e.g., when .count() is called):
> {code}
> >>> 14/12/30 17:13:44 WARN PythonWorkerFactory: Failed to open socket to
> >>> Python daemon:
> java.net.SocketException: Malformed reply from SOCKS server
> at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:129)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:503)
> at java.net.Socket.connect(Socket.java:579)
> at java.net.Socket.connect(Socket.java:528)
> at java.net.Socket.<init>(Socket.java:425)
> at java.net.Socket.<init>(Socket.java:241)
> at
> org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
> at
> org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
> at
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
> at
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
> at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:102)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]