[ https://issues.apache.org/jira/browse/PHOENIX-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth updated PHOENIX-6623: --------------------------------- Description: We are using below versions of Phoenix, HBase and Spark. Phoenix - 4.7 HBase - 2.6.5 Spark - 2.4 Created a phoenix table by mentioning one of the field datatype as DATE and TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below. CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL ( "CID" INTEGER, *"CDATE" DATE,* "CTIMESTAMP" TIMESTAMP, CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID")); Upserted records using upsert command and below is the data in table. |CID|CDATE |CTIMESTAMP | |1 |*2021-11-21*|2022-01-18 18:30:33.896| |2 |*2021-11-18*|2022-01-18 18:45:59.336| |3 |*2021-11-17*|2022-01-18 19:01:04.265| Now, reading data from above created table in pyspark shell. We have set *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* file. ** Below code snippet read data from phoenix via JDBC and it r{*}ead DATE datatype field as one day less{*}. {noformat} >>> val df = spark.read.format("jdbc") .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM NS_TEST.CUSTOMER_TBL) q") .load(){noformat} >>>df.printSchema() root |-- CID: integer (nullable = true) |-- *CDATE: date* (nullable = true) |-- CTIMESTAMP: timestamp (nullable = true) >>>{*}df.select('{*}').show(truncate=False)\{*} |CID|CDATE |CTIMESTAMP | |1 |*2021-11-20*|2022-01-18 18:30:33.896| |2 |*2021-11-17*|2022-01-18 18:45:59.336| |3 |*2021-11-16*|2022-01-18 19:01:04.265| We have also tried using phoenix data source instead of JDBC and below is the code snippet. It also read DATE datatype field as one day less. {noformat} val df2 = spark.read.format("org.apache.phoenix.spark") .option("table", "NS_TEST.CUSTOMER_TBL") .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") .load(){noformat} ** >>>df.printSchema() root |-- CID: integer (nullable = true) |-- *CDATE: date* (nullable = true) |-- CTIMESTAMP: timestamp (nullable = true) >>>{*}df.select('{*}').show(truncate=False)\{*} |CID|CDATE |CTIMESTAMP | |1 |*2021-11-20*|2022-01-18 18:30:33.896| |2 |*2021-11-17*|2022-01-18 18:45:59.336| |3 |*2021-11-16*|2022-01-18 19:01:04.265| Please help us on this issue why Phoenix Spark reading DATE datatype field value as {*}one day less{*}. was: We are using below versions of Phoenix, HBase and Spark. Phoenix - 4.7 HBase - 2.6.5 Spark - 2.4 Created a phoenix table by mentioning one of the field datatype as DATE and TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below. CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL ( "CID" INTEGER, *"CDATE" DATE,* "CTIMESTAMP" TIMESTAMP, CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("ID")); Upserted records using upsert command and below is the data in table. |CID|CDATE |CTIMESTAMP | |1 |*2021-11-21*|2022-01-18 18:30:33.896| |2 |*2021-11-18*|2022-01-18 18:45:59.336| |3 |*2021-11-17*|2022-01-18 19:01:04.265| Now, reading data from above created table in pyspark shell. We have set *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* file. ** Below code snippet read data from phoenix via JDBC and it r{*}ead DATE datatype field as one day less{*}. >>> *df = spark.read.format("jdbc") * *.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") * *.option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") * *.option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM NS_TEST.CUSTOMER_TBL) q") * *.load()* >>>df.printSchema() root |-- CID: integer (nullable = true) |-- *CDATE: date* (nullable = true) |-- CTIMESTAMP: timestamp (nullable = true) >>>{*}df.select('{*}').show(truncate=False)\{*} |CID|CDATE |CTIMESTAMP | |1 |*2021-11-20*|2022-01-18 18:30:33.896| |2 |*2021-11-17*|2022-01-18 18:45:59.336| |3 |*2021-11-16*|2022-01-18 19:01:04.265| We have also tried using phoenix data source instead of JDBC and below is the code snippet. It also read DATE datatype field as one day less. >>>{*}df = spark.read.format("org.apache.phoenix.spark") {{*}} *.option("table", "NS_TEST.CUSTOMER_TBL") * *.option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") * *.load()* >>>df.printSchema() root |-- CID: integer (nullable = true) |-- *CDATE: date* (nullable = true) |-- CTIMESTAMP: timestamp (nullable = true) >>>{*}df.select('{*}').show(truncate=False)\{*} |CID|CDATE |CTIMESTAMP | |1 |*2021-11-20*|2022-01-18 18:30:33.896| |2 |*2021-11-17*|2022-01-18 18:45:59.336| |3 |*2021-11-16*|2022-01-18 19:01:04.265| Please help us on this issue why Phoenix Spark reading DATE datatype field value as {*}one day less{*}. > Phoenix Spark reading DATE datatype value less than one day from phoenix table > ------------------------------------------------------------------------------ > > Key: PHOENIX-6623 > URL: https://issues.apache.org/jira/browse/PHOENIX-6623 > Project: Phoenix > Issue Type: Bug > Components: spark-connector > Affects Versions: 4.7.0 > Reporter: Anand > Assignee: Istvan Toth > Priority: Blocker > > We are using below versions of Phoenix, HBase and Spark. > Phoenix - 4.7 > HBase - 2.6.5 > Spark - 2.4 > Created a phoenix table by mentioning one of the field datatype as DATE and > TIMESTAMP in Phoenix using Squirrel SQL. DDL is given below. > CREATE TABLE IF NOT EXISTS NS_TEST.CUSTOMER_TBL ( > "CID" INTEGER, > *"CDATE" DATE,* > "CTIMESTAMP" TIMESTAMP, > CONSTRAINT CUSTOMER_TBL_PK PRIMARY KEY ("CID")); > Upserted records using upsert command and below is the data in table. > > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-21*|2022-01-18 18:30:33.896| > |2 |*2021-11-18*|2022-01-18 18:45:59.336| > |3 |*2021-11-17*|2022-01-18 19:01:04.265| > > Now, reading data from above created table in pyspark shell. We have set > *spark.sql.session.timeZone=UTC* to spark while launching pyspark shell. > Also, we have set *phoenix.query.dateFormatTimeZone=UTC* in *hbase-site.xml* > file. ** > Below code snippet read data from phoenix via JDBC and it r{*}ead DATE > datatype field as one day less{*}. > {noformat} > >>> val df = spark.read.format("jdbc") > .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") > .option("url", "jdbc:phoenix:localhost:2181:/hbase-secure") > .option("dbtable", "(SELECT CID, CDATE, CTIMESTAMP FROM > NS_TEST.CUSTOMER_TBL) q") > .load(){noformat} > >>>df.printSchema() > root > |-- CID: integer (nullable = true) > |-- *CDATE: date* (nullable = true) > |-- CTIMESTAMP: timestamp (nullable = true) > >>>{*}df.select('{*}').show(truncate=False)\{*} > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-20*|2022-01-18 18:30:33.896| > |2 |*2021-11-17*|2022-01-18 18:45:59.336| > |3 |*2021-11-16*|2022-01-18 19:01:04.265| > > We have also tried using phoenix data source instead of JDBC and below is the > code snippet. It also read DATE datatype field as one day less. > {noformat} > val df2 = spark.read.format("org.apache.phoenix.spark") > .option("table", "NS_TEST.CUSTOMER_TBL") > .option("zkUrl", "jdbc:phoenix:localhost:2181:/hbase-secure") > .load(){noformat} > ** > >>>df.printSchema() > root > |-- CID: integer (nullable = true) > |-- *CDATE: date* (nullable = true) > |-- CTIMESTAMP: timestamp (nullable = true) > >>>{*}df.select('{*}').show(truncate=False)\{*} > |CID|CDATE |CTIMESTAMP | > > |1 |*2021-11-20*|2022-01-18 18:30:33.896| > |2 |*2021-11-17*|2022-01-18 18:45:59.336| > |3 |*2021-11-16*|2022-01-18 19:01:04.265| > > Please help us on this issue why Phoenix Spark reading DATE datatype field > value as {*}one day less{*}. -- This message was sent by Atlassian Jira (v8.20.10#820010)