andygrove opened a new issue, #2733:
URL: https://github.com/apache/datafusion-comet/issues/2733
### What is the problem the feature request solves?
Comet currently assumes that all native processing uses the UTC timezone.
When reading from Parquet sources, Comet converts timestamps to UTC.
```
String timeZoneId = conf.get("spark.sql.session.timeZone");
// Native code uses "UTC" always as the timeZoneId when converting
from spark to arrow schema.
Schema arrowSchema = Utils$.MODULE$.toArrowSchema(sparkSchema, "UTC");
byte[] serializedRequestedArrowSchema =
serializeArrowSchema(arrowSchema);
Schema dataArrowSchema = Utils$.MODULE$.toArrowSchema(dataSchema,
"UTC");
byte[] serializedDataArrowSchema =
serializeArrowSchema(dataArrowSchema);
```
However, we are now seeing that this causes correctness issues or exceptions
when the data source is not Parquet:
- https://github.com/apache/datafusion-comet/issues/2720
- https://github.com/apache/datafusion-comet/issues/2649
This epic is for reviewing and discussing Comet's approach to time zones.
### Describe the potential solution
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]