[
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748635#comment-17748635
]
daicheng commented on IMPALA-12322:
-----------------------------------
h1. The time reading parquet is also different between Impala and Spark.
(1) Impala creates a parquet table。
{code:java}
create external table test.test_timezone3 as select id,etl_update_time from
ods_pvs_middle_data__signal_schema_product limit 3; {code}
* Query with Impala
{code:java}
select * from test.test_timezone3; {code}
!image-2023-07-28-20-31-09-457.png|width=480,height=93!
* Using Spark query,
{code:java}
SparkSession.builder().config("spark.sql.session.timeZone","Asia/Shanghai").master("local[4]").getOrCreate().read.parquet("/warehouse/tablespace/external/hive/test.db/test_timezone3").show(23,false)
{code}
!image-2023-07-28-22-19-57-107.png!
*the result show that Spark have 8 hours later than Impala*
(2) Spark creates a parquet table
{code:java}
data.write.parquet("/warehouse/tablespace/external/hive/test.db/test_timezone4")
{code}
Query with Impala
{code:java}
//create table link to paruqet("test_timezone4")
create external table test.test_timezone4(id String,etl_update_time Timestamp)
stored as parquet location
'hdfs://bigdata.dev.slave1.com:8020/warehouse/tablespace/external/hive/test.db/test_timezone4'
//query the table
select * from test.test_timezone4{code}
!image-2023-07-28-22-36-37-884.png!
Using Spark query
{code:java}
//query the table with spark
SparkSession.builder().config("spark.sql.session.timeZone","Asia/Shanghai").master("local[4]").getOrCreate().read.parquet("hdfs://192.168.104.142:8020/warehouse/tablespace/external/hive/test.db/test_timezone4").show(23,false)
{code}
!image-2023-07-28-22-29-40-083.png!
*ther result show that the Spark query time is consistent with the Impala query
time.*
> return wrong timestamp when scan kudu timestamp with timezone
> -------------------------------------------------------------
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
> Issue Type: Bug
> Environment: impala 4.1.1
> Reporter: daicheng
> Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png,
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png,
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png,
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png,
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png,
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png,
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png,
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png,
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png,
> image-2023-07-28-22-36-37-884.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell:
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
> it seems like impala timezone didn't work when kudu column type is
> timestamp, but it work fine in parquet file,I don't know why?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]