[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?

chillon_m (JIRA) Wed, 02 Mar 2016 18:18:06 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175516#comment-15175516
 ]


chillon_m edited comment on SPARK-13614 at 3/3/16 2:16 AM:
-----------------------------------------------------------

@[~srowen]
the same size of dataset(hot.count()=599147,ghot.size=21844,10Byte/row),collect 
don't trigger memory leak(first image),but show() trigger it.why?in 
general,collect trigger it easily("Keep in mind that your entire dataset must 
fit in memory on a single machine to use collect() on it, so collect() 
shouldn’t be used on large datasets." in <learning spark>),but collect don't 
trigger.



was (Author: chillon_m):
@[~srowen]
the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't 
trigger memory leak(first image),but show() trigger it.why?in general,collect 
trigger it easily("Keep in mind that your entire dataset must fit in memory on 
a single machine to use collect() on it, so collect() shouldn’t be used on 
large datasets." in <learning spark>),but collect don't trigger.


> show() trigger memory leak,why?
> -------------------------------
>
>                 Key: SPARK-13614
>                 URL: https://issues.apache.org/jira/browse/SPARK-13614
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: chillon_m
>         Attachments: memory leak.png, memory.png
>
>
> hot.count()=599147
> ghot.size=21844
> [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell 
> --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar 
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>       /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
> Type in expressions to have them evaluated.
> Type :help for more information.
> Spark context available as sc.
> SQL context available as sqlContext.
> scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> 
> "jdbc:mysql://:/?user=&password=","dbtable" -> "")).load()
> Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without 
> server's identity verification is not recommended. According to MySQL 
> 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established 
> by default if explicit option isn't set. For compliance with existing 
> applications not using SSL the verifyServerCertificate property is set to 
> 'false'. You need either to explicitly disable SSL by setting useSSL=false, 
> or set useSSL=true and provide truststore for server certificate verification.
> hot: org.apache.spark.sql.DataFrame = []
> scala> val ghot=hot.groupBy("Num","pNum").count().collect()
> Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without 
> server's identity verification is not recommended. According to MySQL 
> 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established 
> by default if explicit option isn't set. For compliance with existing 
> applications not using SSL the verifyServerCertificate property is set to 
> 'false'. You need either to explicitly disable SSL by setting useSSL=false, 
> or set useSSL=true and provide truststore for server certificate verification.
> ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310...
> scala> ghot.take(20)
> res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]....)
> scala> hot.groupBy("Num","pNum").count().show()
> Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without 
> server's identity verification is not recommended. According to MySQL 
> 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established 
> by default if explicit option isn't set. For compliance with existing 
> applications not using SSL the verifyServerCertificate property is set to 
> 'false'. You need either to explicitly disable SSL by setting useSSL=false, 
> or set useSSL=true and provide truststore for server certificate verification.
> 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = 
> 4194304 bytes, TID = 202
> +----------+---------+-----+
> |     QQNum| TroopNum|count|
> +----------+---------+-----+
> |1XXXXXXXXX|38XXXXXXX|    1|
> |1XXXXXXXXX| 5XXXXXXX|    2|
> |1XXXXXXXXX|26XXXXXXX|    6|
> |1XXXXXXXXX|14XXXXXXX|    3|
> |1XXXXXXXXX|41XXXXXXX|   14|
> |1XXXXXXXXX|48XXXXXXX|   18|
> |1XXXXXXXXX|23XXXXXXX|    2|
> |1XXXXXXXXX|  XXXXXXX|   34|
> |1XXXXXXXXX|52XXXXXXX|    1|
> |1XXXXXXXXX|52XXXXXXX|    2|
> |1XXXXXXXXX|49XXXXXXX|    3|
> |1XXXXXXXXX|42XXXXXXX|    3|
> |1XXXXXXXXX|17XXXXXXX|   11|
> |1XXXXXXXXX|25XXXXXXX|  129|
> |1XXXXXXXXX|13XXXXXXX|    2|
> |1XXXXXXXXX|19XXXXXXX|    1|
> |1XXXXXXXXX|32XXXXXXX|    9|
> |1XXXXXXXXX|38XXXXXXX|    6|
> |1XXXXXXXXX|38XXXXXXX|   13|
> |1XXXXXXXXX|30XXXXXXX|    4|
> +----------+---------+-----+
> only showing top 20 rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?

Reply via email to