cxzl25 opened a new pull request, #1657:
URL: https://github.com/apache/auron/pull/1657
# Which issue does this PR close?
Closes #1656
# Rationale for this change
Proleptic Gregorian calendar is used instead of Julian + Gregorian in Spark3.
https://issues.apache.org/jira/browse/SPARK-26651
There is a chrono library in Rust that supports proleptic Gregorian calendar.
However, in some timestamps that require Julian to be converted to
Gregorian, the results of Spark and Auron may be inconsistent. There is no
ready-made conversion implementation in Rust.
https://github.com/apache/spark/blob/3c50fda6f29d95c24b664a32ee41c61f0a19eedb/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala#L525
```sql
create table t1_parquet (c1 timestamp) stored as parquet;
set spark.sql.parquet.int96RebaseModeInWrite=LEGACY;
insert overwrite t1_parquet values (timestamp '0001-01-01 00:00:00');
select * from t1_parquet;
```
Spark
```
0001-01-01 00:00:00
```
Auron
```
0000-12-30 00:05:43
```
---
If it is ORC format, we will encounter an overflow error when reading.
https://github.com/apache/auron/issues/1638
# What changes are included in this PR?
Introducing two configurations, the default is true.
When set to false, when the scan schema contains timestamp, fallback to
Spark implementation.
```prop
spark.auron.enable.scan.parquet.timestamp=true
spark.auron.enable.scan.orc.timestamp=true
```
# Are there any user-facing changes?
# How was this patch tested?
Auron
```
set spark.auron.enable.scan.parquet.timestamp=false;
```
```
0001-01-01 00:00:00
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]