[jira] [Commented] (SPARK-19255) SQL Listener is causing out of memory, in case of large no of shuffle partition

Ashok Kumar (JIRA) Tue, 17 Jan 2017 19:37:55 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827379#comment-15827379
 ]


Ashok Kumar commented on SPARK-19255:
-------------------------------------

@Takeshi , thanks for looking into the issue. 
You are right, we can handle it by increasing driver memory, but our actual 
scenario is 10 million shuffle partition which is 100 times. 

After analysing the code it is found that when user looks for metrics via 
spark/sql ui we are merging all accumulator data and then displaying it. 
     
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L336

So one suggestion is, how if we merge accumulator metrics after every 
configured number of task. In this case out of memory issue will not occur.  
Please suggest, looking forward for your input.

> SQL Listener is causing out of memory, in case of large no of shuffle 
> partition
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-19255
>                 URL: https://issues.apache.org/jira/browse/SPARK-19255
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>         Environment: Linux
>            Reporter: Ashok Kumar
>            Priority: Minor
>         Attachments: spark_sqllistener_oom.png
>
>
> Test steps.
> 1.CREATE TABLE sample(imei string,age int,task bigint,num double,level 
> decimal(10,3),productdate timestamp,name string,point int)USING 
> com.databricks.spark.csv OPTIONS (path "data.csv", header "false", 
> inferSchema "false");
> 2. set spark.sql.shuffle.partitions=100000;
> 3. select count(*) from (select task,sum(age) from sample group by task) t;
> After running above query, number of objects in map variable 
> _stageIdToStageMetrics has increase to very high number , this increment is 
> proportional to number of shuffle partition.
> Please have a look at attached screenshot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19255) SQL Listener is causing out of memory, in case of large no of shuffle partition

Reply via email to