[ 
https://issues.apache.org/jira/browse/SPARK-18857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817170#comment-15817170
 ] 

Dongjoon Hyun commented on SPARK-18857:
---------------------------------------

Hi, [~srowen].
This is a bug existing 2.0.2 and 2.1.X.
I'll create a backport for this issue.

> SparkSQL ThriftServer hangs while extracting huge data volumes in incremental 
> collect mode
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18857
>                 URL: https://issues.apache.org/jira/browse/SPARK-18857
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>            Reporter: vishal agrawal
>            Assignee: Dongjoon Hyun
>             Fix For: 2.2.0
>
>         Attachments: GC-spark-1.6.3, GC-spark-2.0.2
>
>
> We are trying to run a sql query on our spark cluster and extracting around 
> 200 million records through SparkSQL ThriftServer interface. This query works 
> fine for Spark 1.6.3 version, however for spark 2.0.2, thrift server hangs 
> after fetching data from a few partitions (we are using incremental collect 
> mode with 400 partitions). As per documentation max memory taken up by thrift 
> server should be what is required by the biggest data partition. But we 
> observed that Thrift server is not releasing the old partitions memory 
> whenever the GC occurs even though it has moved to next partition data 
> fetches. which is not the case with 1.6.3 version.
> On further investigation we found that SparkExecuteStatementOperation.scala 
> was modified for "[SPARK-16563][SQL] fix spark sql thrift server FetchResults 
> bug" and result set iterator was duplicated to keep a reference to the first 
> set.
> +      val (itra, itrb) = iter.duplicate
> +      iterHeader = itra
> +      iter = itrb
> We suspect that this is resulting in the memory not being cleared on GC. To 
> confirm this we created an iterator in our test class and fetched the data 
> once without duplicating and second time with creating a duplicate. we could 
> see that in first instance it ran fine and fetched the entire data set while 
> in second instance driver hanged after fetching data from a few partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to