[
https://issues.apache.org/jira/browse/SPARK-32746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187617#comment-17187617
]
Rahul Bhatia commented on SPARK-32746:
--------------------------------------
[~hyukjin.kwon]
Here are the logs, and more symptoms as mentioned are that the code runs fine
and completes in 2-4 seconds in standalone mode(on my local machine), on the
cluster, it shows no progress.
{noformat}
20/08/29 18:51:25 ERROR netty.Inbox: Ignoring error
org.apache.spark.SparkException: NettyRpcEndpointRef(spark-client://YarnAM)
does not implement 'receive'
at
org.apache.spark.rpc.RpcEndpoint$$anonfun$receive$1.applyOrElse(RpcEndpoint.scala:70)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/08/30 12:51:25 ERROR netty.Inbox: Ignoring error
org.apache.spark.SparkException: NettyRpcEndpointRef(spark-client://YarnAM)
does not implement 'receive'
at
org.apache.spark.rpc.RpcEndpoint$$anonfun$receive$1.applyOrElse(RpcEndpoint.scala:70)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/08/31 06:51:25 ERROR netty.Inbox: Ignoring error
org.apache.spark.SparkException: NettyRpcEndpointRef(spark-client://YarnAM)
does not implement 'receive'
at
org.apache.spark.rpc.RpcEndpoint$$anonfun$receive$1.applyOrElse(RpcEndpoint.scala:70)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/08/31 07:31:04 INFO yarn.YarnAllocator: Completed container
container_e66_1589136601716_6876212_01_000031 on host:
bhdp4080.prod.bdd.jp.local (state: COMPLETE, exit status: -102)
20/08/31 07:31:04 INFO yarn.YarnAllocator: Container
container_e66_1589136601716_6876212_01_000031 on host:
bhdp4080.prod.bdd.jp.local was preempted.
20/08/31 07:31:07 INFO yarn.YarnAllocator: Will request 1 executor
container(s), each with 16 core(s) and 18022 MB memory (including 1638 MB of
overhead)
20/08/31 07:31:07 INFO yarn.YarnAllocator: Submitted 1 unlocalized container
requests.
20/08/31 07:31:55 INFO yarn.YarnAllocator: Completed container
container_e66_1589136601716_6876212_01_000030 on host:
bhdp4365.prod.hnd1.bdd.local (state: COMPLETE, exit status: -102)
20/08/31 07:31:55 INFO yarn.YarnAllocator: Container
container_e66_1589136601716_6876212_01_000030 on host:
bhdp4365.prod.hnd1.bdd.local was preempted.
20/08/31 07:31:55 INFO yarn.YarnAllocator: Will request 1 executor
container(s), each with 16 core(s) and 18022 MB memory (including 1638 MB of
overhead)
20/08/31 07:31:55 INFO yarn.YarnAllocator: Submitted 1 unlocalized container
requests.
20/08/31 07:37:04 INFO impl.AMRMClientImpl: Received new token for :
bhdp4564.prod.hnd1.bdd.local:45454
20/08/31 07:37:04 INFO yarn.YarnAllocator: Launching container
container_e66_1589136601716_6876212_01_000032 on host
bhdp4564.prod.hnd1.bdd.local for executor with ID 31
20/08/31 07:37:04 INFO yarn.YarnAllocator: Received 1 containers from YARN,
launching executors on 1 of them.
20/08/31 07:37:04 INFO impl.ContainerManagementProtocolProxy:
yarn.client.max-cached-nodemanagers-proxies : 0
20/08/31 07:37:04 INFO impl.ContainerManagementProtocolProxy: Opening proxy :
bhdp4564.prod.hnd1.bdd.local:45454
20/08/31 07:41:40 INFO impl.AMRMClientImpl: Received new token for :
bhdp4569.prod.hnd1.bdd.local:45454
20/08/31 07:41:40 INFO yarn.YarnAllocator: Launching container
container_e66_1589136601716_6876212_01_000034 on host
bhdp4569.prod.hnd1.bdd.local for executor with ID 32
20/08/31 07:41:40 INFO yarn.YarnAllocator: Received 1 containers from YARN,
launching executors on 1 of them.
20/08/31 07:41:40 INFO impl.ContainerManagementProtocolProxy:
yarn.client.max-cached-nodemanagers-proxies : 0
20/08/31 07:41:40 INFO impl.ContainerManagementProtocolProxy: Opening proxy :
bhdp4569.prod.hnd1.bdd.local:45454{noformat}
> Not able to run Pandas UDF
> ---------------------------
>
> Key: SPARK-32746
> URL: https://issues.apache.org/jira/browse/SPARK-32746
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.0.0
> Environment: Pyspark 3.0.0
> PyArrow - 1.0.1(also tried with Pyarrrow 0.15.1, no progress there)
> Pandas - 0.25.3
>
> Reporter: Rahul Bhatia
> Priority: Major
> Attachments: Screenshot 2020-08-31 at 9.04.07 AM.png
>
>
> Hi,
> I am facing issues in running Pandas UDF on a yarn cluster with multiple
> nodes, I am trying to perform a simple DBSCAN algorithm to multiple groups in
> my dataframe, to start with, I am just using a simple example to test things
> out -
> {code:python}
> import pandas as pd
> from pyspark.sql.types import StructType, StructField, DoubleType,
> StringType, IntegerType
> from sklearn.cluster import DBSCAN
> from pyspark.sql.functions import pandas_udf, PandasUDFTypedata
> data = [(1, 11.6133, 48.1075),
> (1, 11.6142, 48.1066),
> (1, 11.6108, 48.1061),
> (1, 11.6207, 48.1192),
> (1, 11.6221, 48.1223),
> (1, 11.5969, 48.1276),
> (2, 11.5995, 48.1258),
> (2, 11.6127, 48.1066),
> (2, 11.6430, 48.1275),
> (2, 11.6368, 48.1278),
> (2, 11.5930, 48.1156)]
> df = spark.createDataFrame(data, ["id", "X", "Y"])
> output_schema = StructType(
> [
> StructField('id', IntegerType()),
> StructField('X', DoubleType()),
> StructField('Y', DoubleType()),
> StructField('cluster', IntegerType())
> ]
> )
> @pandas_udf(output_schema, PandasUDFType.GROUPED_MAP)
> def dbscan(data):
> data["cluster"] = DBSCAN(eps=5, min_samples=3).fit_predict(data[["X",
> "Y"]])
> result = pd.DataFrame(data, columns=["id", "X", "Y", "cluster"])
> return result
> res = df.groupby("id").apply(dbscan)
> res.show()
> {code}
>
> The code keeps running forever on the yarn cluster, I expect it to be
> finished within seconds(this works fine on standalone mode and finishes in
> 2-4 seconds), on checking the Spark UI, I can see that the Spark job is
> stuck(99/580) and doesn't make any progress forever.
>
> Also it doesn't run in parallel, am I missing something? !Screenshot
> 2020-08-31 at 9.04.07 AM.png!
>
>
> I am new to Spark, and still trying to understand a lot of things.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]