kangakum36 commented on code in PR #43268:
URL: https://github.com/apache/arrow/pull/43268#discussion_r1688258069
##########
cpp/src/parquet/arrow/schema.cc:
##########
@@ -382,18 +382,18 @@ Status FieldToNode(const std::string& name, const
std::shared_ptr<Field>& field,
break;
case ArrowTypeId::TIME32:
type = ParquetType::INT32;
- logical_type =
- LogicalType::Time(/*is_adjusted_to_utc=*/true,
LogicalType::TimeUnit::MILLIS);
+ logical_type =
LogicalType::Time(arrow_properties.time_is_adjusted_to_utc(),
Review Comment:
@pitrou Responding to your other comment here. The use case for
`is_adjusted_to_utc=false` is mostly for compatibility reasons, as you
mentioned above. We have other parquet writers writing with
`is_adjusted_to_utc=false`, and we'd like our arrow writers to produce parquet
files with the same logical schema.
After thinking about it, I also agree that having a time type with a time
zone (or time zone adjustment, as is the case with `is_adjusted_to_utc=true`)
is not well defined. I believe that one of the original DuckDB
[discussions](https://github.com/duckdb/duckdb/discussions/2647) says that `The
TIME WITH TIME ZONE is not well-defined because binning a time requires a
specific date as well as a time zone. We could decide to add this type for
completeness...`, so it seems like it was added solely for completeness.
One possible use case I could see is if someone wanted to store a dataset of
events that happen every day at the same time (for example, reporting deadlines
that need to be met every day). Then the expectation might be that readers
would read the time and interpret it as an instant on the current day. But if
readers are in different time zones, they need to interpret the times as the
same instant.
However, even in that case it might be better to store a `timestamptz` type
and have a different event for each day, so the example is a bit contrived.
Regardless, would it be ok to add a `false` option to arrow for
compatibility purposes and discuss deprecating the flag from parquet separately
on dev@parquet?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]