Alvaro Fernandez created PHOENIX-6321:
-----------------------------------------
Summary: Array of Shorts/Smallint returned as Array of Integers
Key: PHOENIX-6321
URL: https://issues.apache.org/jira/browse/PHOENIX-6321
Project: Phoenix
Issue Type: Bug
Components: spark-connector
Affects Versions: 5.0.0
Reporter: Alvaro Fernandez
When using spark connector to read a Phoenix table with at least a column
defined as Array of Shorts, the resulting Dataset infers the schema as a Array
of Integers.
I believe this is due to the following code:
phoenix/phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala:182
case t if t.isInstanceOf[PSmallintArray] ||
t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull =
true)
phoenix-connectors/phoenix-spark-base/src/main/scala/org/apache/phoenix/spark/SparkSchemaUtil.scala:82
case t if t.isInstanceOf[PSmallintArray] ||
t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull =
true)
Subsequent tries to programatically cast to Shorts will fail with a
ClassCastException.
And it is also impossible to define the original schema within a
DataFrameReader as it fails with:"org.apache.spark.sql.AnalysisException:
org.apache.phoenix.spark does not allow user-specified schemas.;"
Making it impossible afaik to work with tables with this kind of data types.
Is there any reason to have this code intepreting SmallInts/Shorts as Integers?
Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)