[
https://issues.apache.org/jira/browse/PHOENIX-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904916#comment-14904916
]
Josh Mahonin commented on PHOENIX-2287:
---------------------------------------
Updated patch adds support for Spark 1.5.0, and is backwards compatible back
down to 1.3.0 (manually tested, Spark version profiles may be worth looking at
in the future)
In 1.5.0, they've gone and explicitly hidden the GenericMutableRow data
structure. Fortunately, we are able to the external-facing 'Row' data type,
which is backwards compatible, and should remain compatible in future releases
as well.
As part of the update, Spark SQL deprecated a constructor on their
'DecimalType'. In updating this, I exposed a new issue, which is that we don't
carry-forward the precision and scale of the underlying PDecimal type through
to Spark. For now I've set it to use the Spark defaults, but I'll create
another issue for that specifically. I've included an ignored integration test
in this patch as well.
[~maghamravikiran] Could you take a look?
> Spark Plugin Exception - java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to
> org.apache.spark.sql.Row
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2287
> URL: https://issues.apache.org/jira/browse/PHOENIX-2287
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.2
> Environment: - HBase 1.1.1 running in standalone mode on OS X
> - Spark 1.5.0
> - Phoenix 4.5.2
> Reporter: Babar Tareen
> Attachments: PHOENIX-2287.patch
>
>
> Running the DataFrame example on Spark Plugin page
> (https://phoenix.apache.org/phoenix_spark.html) results in following
> exception. The same code works as expected with Spark 1.4.1.
> {code:java}
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.SQLContext
> import org.apache.phoenix.spark._
> val sc = new SparkContext("local", "phoenix-test")
> val sqlContext = new SQLContext(sc)
> val df = sqlContext.load(
> "org.apache.phoenix.spark",
> Map("table" -> "TABLE1", "zkUrl" -> "127.0.0.1:2181")
> )
> df
> .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
> .select(df("ID"))
> .show
> {code}
> Exception
> {quote}
> java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to
> org.apache.spark.sql.Row
> at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
> ~[scala-library-2.11.4.jar:na]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
> ~[scala-library-2.11.4.jar:na]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363)
> ~[scala-library-2.11.4.jar:na]
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)