Riju Trivedi created HIVE-28075:
-----------------------------------
Summary: Vectorized DayOFWeek returns inconsistent results for
non-UTC timezones.
Key: HIVE-28075
URL: https://issues.apache.org/jira/browse/HIVE-28075
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 4.0.0-beta-1
Reporter: Riju Trivedi
Assignee: Riju Trivedi
Simple problem reproduce -
{code:java}
--! qt:timezone:Asia/Shanghai
CREATE EXTERNAL TABLE dayOfWeek_test(
`fund_code` string,
`test_date` string
);
INSERT INTO dayOfWeek_test(fund_code,test_date)
values('SEC016210079','2023-04-13');
SELECT fund_code,
test_date,
dayofweek(test_date) AS SR,
CASE
WHEN dayofweek(test_date) = 1 THEN 7
ELSE dayofweek(test_date) - 1
END AS week_day
FROM dayOfWeek_test;
Result :
SEC016210079 2023-04-13 4 3
Expected Result:
SEC016210079 2023-04-13 5 4
{code}
The issue is only with Vectorized path and non-UTC timezones. The
non-vectorized path uses _DateTimeFormatter_ and the vectorized path __ uses
_SimpleDateFormat_ and calendar initialized with UTC timezone. Hence, the local
time zone date is converted to UTC which changes the date and dayOfWeek()
result.
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfWeekString.java#L59]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)