[
https://issues.apache.org/jira/browse/SPARK-19623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996344#comment-15996344
]
Jaeboo Jung commented on SPARK-19623:
-------------------------------------
It no longer appears if I upgrade spark version to 2.1.0.
> Take rows from DataFrame with empty first partition
> ---------------------------------------------------
>
> Key: SPARK-19623
> URL: https://issues.apache.org/jira/browse/SPARK-19623
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.2
> Reporter: Jaeboo Jung
> Priority: Minor
> Fix For: 2.1.0
>
>
> I use Spark 1.6.2 with 1 master and 6 workers. Assuming we have partitions
> having a empty first partition, DataFrame and its RDD have different
> behaviors during taking rows from it. If we take only 1000 rows from
> DataFrame, it causes OOME but RDD is OK.
> In detail,
> DataFrame without a empty first partition => OK
> DataFrame with a empty first partition => OOME
> RDD of DataFrame with a empty first partition => OK
> Codes below reproduce this error.
> {code}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(1 to 100000000,1000).map(i =>
> Row.fromSeq(Array.fill(100)(i)))
> val schema = StructType(for(i <- 1 to 100) yield {
> StructField("COL"+i,IntegerType, true)
> })
> val rdd2 = rdd.mapPartitionsWithIndex((idx,iter) => if(idx==0 || idx==1)
> Iterator[Row]() else iter)
> val df1 = sqlContext.createDataFrame(rdd,schema)
> df1.take(1000) // OK
> val df2 = sqlContext.createDataFrame(rdd2,schema)
> df2.rdd.take(1000) // OK
> df2.take(1000) // OOME
> {code}
> I tested it on Spark 1.6.2 with 2gb of driver memory and 5gb of executor
> memory.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]