[
https://issues.apache.org/jira/browse/SPARK-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust resolved SPARK-4182.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
Issue resolved by pull request 3059
[https://github.com/apache/spark/pull/3059]
> Caching tables containing boolean, binary, array, struct and/or map columns
> doesn't work
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-4182
> URL: https://issues.apache.org/jira/browse/SPARK-4182
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.1
> Reporter: Cheng Lian
> Assignee: Cheng Lian
> Priority: Blocker
> Fix For: 1.2.0
>
>
> If a table contains a column whose type is binary, array, struct, map, and
> for some reason, boolean, in-memory columnar caching doesn't work because a
> {{NoopColumnStats}} is used to collect column statistics. {{NoopColumnStats}}
> returns an empty statistics row, and thus breaks {{InMemoryRelation}}
> statistics calculation.
> Execute the following snippet to reproduce this bug in {{spark-shell}}:
> {code}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.catalyst.types._
> val sparkContext = sc
> import sparkContext._
> val sqlContext = new SQLContext(sparkContext)
> import sqlContext._
> case class BoolField(flag: Boolean)
> val schemaRDD = parallelize(true :: false ::
> Nil).map(BoolField(_)).toSchemaRDD
> schemaRDD.cache().count()
> schemaRDD.count()
> {code}
> Exception thrown:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 4
> at
> org.apache.spark.sql.catalyst.expressions.GenericRow.apply(Row.scala:142)
> at
> org.apache.spark.sql.catalyst.expressions.BoundReference.eval(BoundAttribute.scala:37)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$computeSizeInBytes$1.apply(InMemoryColumnarTableScan.scala:66)
> at
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$computeSizeInBytes$1.apply(InMemoryColumnarTableScan.scala:66)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at
> org.apache.spark.sql.columnar.InMemoryRelation.computeSizeInBytes(InMemoryColumnarTableScan.scala:66)
> at
> org.apache.spark.sql.columnar.InMemoryRelation.statistics(InMemoryColumnarTableScan.scala:87)
> at
> org.apache.spark.sql.columnar.InMemoryRelation.statisticsToBePropagated(InMemoryColumnarTableScan.scala:73)
> at
> org.apache.spark.sql.columnar.InMemoryRelation.withOutput(InMemoryColumnarTableScan.scala:147)
> at
> org.apache.spark.sql.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:122)
> at
> org.apache.spark.sql.CacheManager$$anonfun$useCachedData$1$$anonfun$applyOrElse$1.apply(CacheManager.scala:122)
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]