[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748635#comment-17748635
 ] 

daicheng commented on IMPALA-12322:
-----------------------------------

h1. The time reading parquet is also different between Impala and Spark.

(1) Impala creates a parquet table。
{code:java}
create external table test.test_timezone3 as select id,etl_update_time from 
ods_pvs_middle_data__signal_schema_product limit 3; {code}
 
 * Query with Impala

 
{code:java}
select * from test.test_timezone3; {code}
 

!image-2023-07-28-20-31-09-457.png|width=480,height=93!
 * Using Spark query,
{code:java}
SparkSession.builder().config("spark.sql.session.timeZone","Asia/Shanghai").master("local[4]").getOrCreate().read.parquet("/warehouse/tablespace/external/hive/test.db/test_timezone3").show(23,false)
 {code}
 !image-2023-07-28-22-19-57-107.png!
*the result show that Spark have 8 hours later than Impala* 
 

(2) Spark creates a parquet table

 
{code:java}
data.write.parquet("/warehouse/tablespace/external/hive/test.db/test_timezone4")
 {code}
Query with Impala
 

 
{code:java}
//create table link to paruqet("test_timezone4")
create external table test.test_timezone4(id String,etl_update_time Timestamp) 
stored as parquet location 
'hdfs://bigdata.dev.slave1.com:8020/warehouse/tablespace/external/hive/test.db/test_timezone4'
//query the table
 select * from test.test_timezone4{code}
 !image-2023-07-28-22-36-37-884.png!

 

Using Spark query
{code:java}
//query the table with spark
SparkSession.builder().config("spark.sql.session.timeZone","Asia/Shanghai").master("local[4]").getOrCreate().read.parquet("hdfs://192.168.104.142:8020/warehouse/tablespace/external/hive/test.db/test_timezone4").show(23,false)
 {code}
!image-2023-07-28-22-29-40-083.png!

 

*ther result show that the Spark query time is consistent with the Impala query 
time.*
 

> return wrong timestamp when scan kudu timestamp with timezone
> -------------------------------------------------------------
>
>                 Key: IMPALA-12322
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12322
>             Project: IMPALA
>          Issue Type: Bug
>         Environment: impala 4.1.1
>            Reporter: daicheng
>            Priority: Major
>         Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to