[jira] [Commented] (PHOENIX-6559) spark connector access to SmallintArray / UnsignedSmallintArray columns

ASF GitHub Bot (Jira) Thu, 23 Sep 2021 08:59:04 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419288#comment-17419288
 ]


ASF GitHub Bot commented on PHOENIX-6559:
-----------------------------------------

alferca opened a new pull request #63:
URL: https://github.com/apache/phoenix-connectors/pull/63


   Solves PHOENIX-6559 including 2 test cases: can convert arrays of Short type 
in Phoenix schema, can save arrays of Short type back to phoenix 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> spark connector access to SmallintArray / UnsignedSmallintArray columns
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-6559
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6559
>             Project: Phoenix
>          Issue Type: Bug
>          Components: connectors, spark-connector
>    Affects Versions: connectors-6.0.0
>            Reporter: Alvaro Fernandez
>            Priority: Major
>         Attachments: SparkSchemaUtil.patch
>
>
> We have some tables defined with SMALLINT array[] columns, that are not 
> accessible correctly with the spark connector.
> Seems that the Spark data type is incorrectly inferred by the connector as an 
> array of integers ArrayType(IntegerType), instead of ArrayType(ShortType).
>  A table example:
> {code:java}
> CREATE TABLE IF NOT EXISTS AEIDEV.ARRAY_TABLE (ID BIGINT NOT NULL PRIMARY 
> KEY, COL1 SMALLINT ARRAY[] );
>  UPSERT INTO AEIDEV.ARRAY_TABLE VALUES (1, ARRAY[-32678,-9876,-234,-1]);
>  UPSERT INTO AEIDEV.ARRAY_TABLE VALUES (2, ARRAY[0,8,9,10]);
>  UPSERT INTO AEIDEV.ARRAY_TABLE VALUES (3, ARRAY[123,1234,12345,32767]);{code}
>  Accessing the values from Spark gives wrong values:
>  
> {code:java}
> scala> val df = 
> spark.sqlContext.read.format("org.apache.phoenix.spark").option("table","AEIDEV.ARRAY_TABLE").option("zkUrl","ithdp1101.cern.ch:2181").load
>  df: org.apache.spark.sql.DataFrame = [ID: bigint, COL1: array<int>]
> scala> df.show
>  ---------------------+
> ID COL1
> ---------------------+
> 1 [-647200678, -234...   2 [524288, 655369, ...   3 [80871547, 214743...
> ---------------------+
> scala> df.collect
>  res3: Array[org.apache.spark.sql.Row] = Array([1,WrappedArray(-647200678, 
> -234, 0, 0)], [2,WrappedArray(524288, 655369, 0, 0)], 
> [3,WrappedArray(80871547, 2147430457, 0, 0)])
> {code}
> We have identified the problem in the SparkSchemaUtil class, and applied the 
> tiny patch included in the report. After this, the data type is correctly 
> inferred and results are correct:
>  
> {code:java}
> scala> val df = 
> spark.sqlContext.read.format("org.apache.phoenix.spark").option("table","AEIDEV.ARRAY_TABLE").option("zkUrl","ithdp1101.cern.ch:2181").load
>  df: org.apache.spark.sql.DataFrame = [ID: bigint, COL1: array<smallint>]
> scala> df.show
>  ---------------------+
> ID COL1
> ---------------------+
> 1 [-32678, -9876, -...   2 [0, 8, 9, 10]   3 [123, 1234, 12345...
> ---------------------+
> scala> df.collect
>  res1: Array[org.apache.spark.sql.Row] = Array([1,WrappedArray(-32678, -9876, 
> -234, -1)], [2,WrappedArray(0, 8, 9, 10)], [3,WrappedArray(123, 1234, 12345, 
> 32767)])
> {code}
>  
>  
> We can provide more information and submit a merge request if needed.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6559) spark connector access to SmallintArray / UnsignedSmallintArray columns

Reply via email to