[
https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-13531:
------------------------------------
Assignee: Apache Spark
> Some DataFrame joins stopped working with UnsupportedOperationException: No
> size estimation available for objects
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-13531
> URL: https://issues.apache.org/jira/browse/SPARK-13531
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: koert kuipers
> Assignee: Apache Spark
> Priority: Minor
>
> this is using spark 2.0.0-SNAPSHOT
> dataframe df1:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, obj#135: object, [if (input[0, object].isNullAt)
> null else input[0, object].get AS x#128]
> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else
> x#9), [input[0, object] AS obj#135]
> +- WholeStageCodegen
> : +- Project [_1#8 AS x#9]
> : +- Scan ExistingRDD[_1#8]{noformat}
> show:
> {noformat}+---+
> | x|
> +---+
> | 2|
> | 3|
> +---+{noformat}
> dataframe df2:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true),
> StructField(y,StringType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null else
> y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get
> AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class
> org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0,
> object].get, true) AS y#131]
> +- WholeStageCodegen
> : +- Project [_1#0 AS x#2,_2#1 AS y#3]
> : +- Scan ExistingRDD[_1#0,_2#1]{noformat}
> show:
> {noformat}+---+---+
> | x| y|
> +---+---+
> | 1| 1|
> | 2| 2|
> | 3| 3|
> +---+---+{noformat}
> i run:
> df1.join(df2, Seq("x")).show
> i get:
> {noformat}java.lang.UnsupportedOperationException: No size estimation
> available for objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at
> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat}
> now sure what changed, this ran about a week ago without issues (in our
> internal unit tests). it is fully reproducible, however when i tried to
> minimize the issue i could not reproduce it by just creating data frames in
> the repl with the same contents, so it probably has something to do with way
> these are created (from Row objects and StructTypes).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]