[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960965#comment-15960965 ]
Sean Owen commented on SPARK-20227: ----------------------------------- See https://issues.apache.org/jira/browse/SPARK-20226 for something possibly similar. It'd be useful to get a thread dump to see where the time is being spent. > Job hangs when joining a lot of aggregated columns > -------------------------------------------------- > > Key: SPARK-20227 > URL: https://issues.apache.org/jira/browse/SPARK-20227 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Environment: AWS emr-5.4.0, master: m4.xlarge, core: 4 m4.xlarge > Reporter: Quentin Auge > > I'm trying to replace a lot of different columns in a dataframe with > aggregates of themselves, and then join the resulting dataframe. > {code} > # Create a dataframe with 1 row and 50 columns > n = 50 > df = sc.parallelize([Row(*range(n))]).toDF() > cols = df.columns > # Replace each column values with aggregated values > window = Window.partitionBy(cols[0]) > for col in cols[1:]: > df = df.withColumn(col, sum(col).over(window)) > # Join > other_df = sc.parallelize([Row(0)]).toDF() > result = other_df.join(df, on = cols[0]) > result.show() > {code} > Spark hangs forever when executing the last line. The strange thing is, it > depends on the number of columns. Spark does not hang for n = 5, 10, or 20 > columns. For n = 50 and beyond, it does. > {code} > 17/04/05 14:39:28 INFO ExecutorAllocationManager: Removing executor 1 because > it has been idle for 60 seconds (new desired total will be 0) > 17/04/05 14:39:29 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling > executor 1. > 17/04/05 14:39:29 INFO DAGScheduler: Executor lost: 1 (epoch 0) > 17/04/05 14:39:29 INFO BlockManagerMasterEndpoint: Trying to remove executor > 1 from BlockManagerMaster. > 17/04/05 14:39:29 INFO BlockManagerMasterEndpoint: Removing block manager > BlockManagerId(1, ip-172-30-0-149.ec2.internal, 35666, None) > 17/04/05 14:39:29 INFO BlockManagerMaster: Removed 1 successfully in > removeExecutor > 17/04/05 14:39:29 INFO YarnScheduler: Executor 1 on > ip-172-30-0-149.ec2.internal killed by driver. > 17/04/05 14:39:29 INFO ExecutorAllocationManager: Existing executor 1 has > been removed (new total is 0) > {code} > All executors are inactive and thus killed after 60 seconds, the master > spends some CPU on a process that hangs indefinitely, and the workers are > idle. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org