zeropc opened a new issue #5888:
URL: https://github.com/apache/incubator-doris/issues/5888
Date type column got incorrect value using spark load with GMT-5 timezone
Steps to reproduce the behavior:
1. Deploying BE on GMT-5
2. Spark load data that contains date type column, with value like
'2021-05-22'
3. Got '2021-05-21' in target table
**Expected behavior**
Expect '2021-05-22'
**Screenshots**
show partitions:
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime |
VisibleVersionHash | State | PartitionKey | Range
| DistributionKey | Buckets |
ReplicationNum | StorageMedium | CooldownTime | LastConsistencyCheckTime
| DataSize | IsInMemory |
| 289486 | p20210522 | 2 | 2021-05-23 16:12:47 |
6425342295842155734 | NORMAL | etl_date | [types: [DATE]; keys:
[2021-05-22]; ..types: [DATE]; keys: [2021-05-23]; ) | date, ltv_type | 12
| 3 | HDD | 9999-12-31 23:59:59 | 2021-05-23 23:00:53
| 952.310 KB | false |
select distinct:
+------------+
| etl_date |
+------------+
| 2021-05-21 |
**Additional context**
In /be/src/exec/parquet_reader.cpp:
`
time_t timestamp =
(time_t)((int64_t)ts_array->Value(_current_line_of_batch) *24 * 60 * 60);
struct tm local;
localtime_r(&timestamp, &local);
`
Since date type columns in hive, impala are stored as int32 in parquet
files, standing for the number of dates since 1970-01-01. This is a no-timezone
data, but localtime_r is dependent on machine timezone.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]