[jira] [Commented] (SPARK-19623) Take rows from DataFrame with empty first partition

Jaeboo Jung (JIRA) Thu, 04 May 2017 01:30:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-19623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996344#comment-15996344
 ]


Jaeboo Jung commented on SPARK-19623:
-------------------------------------

It no longer appears if I upgrade spark version to 2.1.0.

> Take rows from DataFrame with empty first partition
> ---------------------------------------------------
>
>                 Key: SPARK-19623
>                 URL: https://issues.apache.org/jira/browse/SPARK-19623
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.2
>            Reporter: Jaeboo Jung
>            Priority: Minor
>             Fix For: 2.1.0
>
>
> I use Spark 1.6.2 with 1 master and 6 workers. Assuming we have partitions 
> having a empty first partition, DataFrame and its RDD have different 
> behaviors during taking rows from it. If we take only 1000 rows from 
> DataFrame, it causes OOME but RDD is OK.
> In detail,
> DataFrame without a empty first partition => OK
> DataFrame with a empty first partition => OOME
> RDD of DataFrame with a empty first partition => OK
> Codes below reproduce this error.
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(1 to 100000000,1000).map(i => 
> Row.fromSeq(Array.fill(100)(i)))
> val schema = StructType(for(i <- 1 to 100) yield {
> StructField("COL"+i,IntegerType, true)
> })
> val rdd2 = rdd.mapPartitionsWithIndex((idx,iter) => if(idx==0 || idx==1) 
> Iterator[Row]() else iter)
> val df1 = sqlContext.createDataFrame(rdd,schema)
> df1.take(1000) // OK
> val df2 = sqlContext.createDataFrame(rdd2,schema)
> df2.rdd.take(1000) // OK
> df2.take(1000) // OOME
> {code}
> I tested it on Spark 1.6.2 with 2gb of driver memory and 5gb of executor 
> memory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19623) Take rows from DataFrame with empty first partition

Reply via email to