[ 
https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213361#comment-15213361
 ] 

koert kuipers commented on SPARK-13531:
---------------------------------------

before the optimization both MapPartitions have output type IntegerType. After 
EliminateSerialization the output type of the first MapPartitions is 
ObjectType(interface org.apache.spark.sql.Row) and for the second it still is 
IntegerType as it should be.

what is not clear to me is why CanBroadCast.unapply is apparently applied to 
the first MapPartitions instead of the second. i am not familiar enough with 
the code for this.

> Some DataFrame joins stopped working with UnsupportedOperationException: No 
> size estimation available for objects
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13531
>                 URL: https://issues.apache.org/jira/browse/SPARK-13531
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: koert kuipers
>            Priority: Minor
>
> this is using spark 2.0.0-SNAPSHOT
> dataframe df1:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, obj#135: object, [if (input[0, object].isNullAt) 
> null else input[0, object].get AS x#128]
> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else 
> x#9), [input[0, object] AS obj#135]
>    +- WholeStageCodegen
>       :  +- Project [_1#8 AS x#9]
>       :     +- Scan ExistingRDD[_1#8]{noformat}
> show:
> {noformat}+---+
> |  x|
> +---+
> |  2|
> |  3|
> +---+{noformat}
> dataframe df2:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true), 
> StructField(y,StringType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null else 
> y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get 
> AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class 
> org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
> object].get, true) AS y#131]
> +- WholeStageCodegen
>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>    :     +- Scan ExistingRDD[_1#0,_2#1]{noformat}
> show:
> {noformat}+---+---+
> |  x|  y|
> +---+---+
> |  1|  1|
> |  2|  2|
> |  3|  3|
> +---+---+{noformat}
> i run:
> df1.join(df2, Seq("x")).show
> i get:
> {noformat}java.lang.UnsupportedOperationException: No size estimation 
> available for objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at 
> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat}
> now sure what changed, this ran about a week ago without issues (in our 
> internal unit tests). it is fully reproducible, however when i tried to 
> minimize the issue i could not reproduce it by just creating data frames in 
> the repl with the same contents, so it probably has something to do with way 
> these are created (from Row objects and StructTypes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to