[ https://issues.apache.org/jira/browse/PHOENIX-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth updated PHOENIX-6623: --------------------------------- Fix Version/s: 5.1.4 (was: 5.1.3) > Phoenix Spark reading DATE datatype value less than one day from phoenix table > ------------------------------------------------------------------------------ > > Key: PHOENIX-6623 > URL: https://issues.apache.org/jira/browse/PHOENIX-6623 > Project: Phoenix > Issue Type: Bug > Components: spark-connector > Affects Versions: 4.7.0 > Reporter: Anand > Assignee: Istvan Toth > Priority: Blocker > Fix For: 5.2.0, 5.1.4 > > > We are using below versions of Phoenix, HBase and Spark. > Phoenix - 4.7 > HBase - 2.6.5 > Spark - 2.4 > Created a phoenix table by mentioning one of the field datatype as DATE and > TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below. > CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL ( > "CID" INTEGER, > *"CDATE" DATE,* > "CTIMESTAMP" TIMESTAMP, > CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID")); > Upserted records using upsert command and below is the data in table. > > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-21*|2022-01-18 18:30:33.896| > |2 |*2021-11-18*|2022-01-18 18:45:59.336| > |3 |*2021-11-17*|2022-01-18 19:01:04.265| > > Now, reading data from above created table in pyspark shell. We have set > *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. > Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* > file. ** > Below code snippet read data from phoenix via JDBC and it r{*}ead DATE > datatype field as one day less{*}. > {noformat} > >>> val df = spark.read.format("jdbc") > .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") > .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") > .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM > NS_TEST.CUSTOMER_TBL) q") > .load(){noformat} > >>>df.printSchema() > root > |-- CID: integer (nullable = true) > |-- *CDATE: date* (nullable = true) > |-- CTIMESTAMP: timestamp (nullable = true) > >>>{*}df.select('{*}').show(truncate=False)\{*} > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-20*|2022-01-18 18:30:33.896| > |2 |*2021-11-17*|2022-01-18 18:45:59.336| > |3 |*2021-11-16*|2022-01-18 19:01:04.265| > > We have also tried using phoenix data source instead of JDBC and below is the > code snippet. It also read DATE datatype field as one day less. > {noformat} > val df2 = spark.read.format("org.apache.phoenix.spark") > .option("table", "NS_TEST.CUSTOMER_TBL") > .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") > .load(){noformat} > ** > >>>df.printSchema() > root > |-- CID: integer (nullable = true) > |-- *CDATE: date* (nullable = true) > |-- CTIMESTAMP: timestamp (nullable = true) > >>>{*}df.select('{*}').show(truncate=False)\{*} > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-20*|2022-01-18 18:30:33.896| > |2 |*2021-11-17*|2022-01-18 18:45:59.336| > |3 |*2021-11-16*|2022-01-18 19:01:04.265| > > Please help us on this issue why Phoenix Spark reading DATE datatype field > value as {*}one day less{*}. -- This message was sent by Atlassian Jira (v8.20.10#820010)