[ https://issues.apache.org/jira/browse/SPARK-33143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215249#comment-17215249 ]
Miklos Szurap commented on SPARK-33143: --------------------------------------- It has been observed with big RDDs. {noformat} 20/10/07 18:27:20 INFO scheduler.DAGScheduler: Job 311 finished: toPandas at /data/1/app/bin/apps/report/doreport.py:91, took 0.619208 s Exception in thread "serve-DataFrame" java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:881) Traceback (most recent call last): File "/data/1/app/bin/apps/report/doreport.py", line 91, in <module> df=dq_final.toPandas() File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2142, in toPandas File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 534, in collect File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 144, in _load_from_socket File "/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 178, in local_connect_and_auth Exception: could not open socket: ["tried to connect to ('127.0.0.1', 33127), but an error occured: "] 20/10/07 18:27:36 INFO spark.SparkContext: Invoking stop() from shutdown hook {noformat} After splitting the app to two parts to process half the data amount in a single run it could finish successfully. > Make SocketAuthServer socket timeout configurable > ------------------------------------------------- > > Key: SPARK-33143 > URL: https://issues.apache.org/jira/browse/SPARK-33143 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.7, 3.0.1 > Reporter: Miklos Szurap > Priority: Major > > In SPARK-21551 the socket timeout for the Pyspark applications has been > increased from 3 to 15 seconds. However it is still hardcoded. > In certain situations even the 15 seconds is not enough, so it should be made > configurable. > This is requested after seeing it in real-life workload failures. > Also it has been suggested and requested in an earlier comment in > [SPARK-18649|https://issues.apache.org/jira/browse/SPARK-18649?focusedCommentId=16493498&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16493498] > In > Spark 2.4 it is under > [PythonRDD.scala|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L899] > in Spark 3.x the code has been moved to > [SocketAuthServer.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/security/SocketAuthServer.scala#L51] > {code} > serverSocket.setSoTimeout(15000) > {code} > Please include this in both 2.4 and 3.x branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org