alamb commented on code in PR #6278:
URL: https://github.com/apache/arrow-rs/pull/6278#discussion_r1727975740
##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -1481,7 +1483,11 @@ mod tests {
),
Field::new("date", DataType::Date32, true),
Field::new("time_milli", DataType::Time32(TimeUnit::Millisecond),
true),
+ Field::new("time_milli_utc",
DataType::Time32(TimeUnit::Millisecond), true)
Review Comment:
I did verify this test fails without the code changes (and thus it covers
the code change)
```
assertion `left == right` failed
left: ColumnDescriptor { primitive_type: PrimitiveType { basic_info:
BasicTypeInfo { name: "time_milli_utc", repetition: Some(OPTIONAL),
converted_type: TIME_MILLIS, logical_type: Some(Time { is_adjusted_to_u_t_c:
true, unit: MILLIS(MilliSeconds) }), id: None }, physical_type: INT32,
type_length: -1, scale: -1, precision: -1 }, max_def_level: 1, max_rep_level:
0, path: ColumnPath { parts: ["time_milli_utc"] } }
right: ColumnDescriptor { primitive_type: PrimitiveType { basic_info:
BasicTypeInfo { name: "time_milli_utc", repetition: Some(OPTIONAL),
converted_type: TIME_MILLIS, logical_type: Some(Time { is_adjusted_to_u_t_c:
false, unit: MILLIS(MilliSeconds) }), id: None }, physical_type: INT32,
type_length: -1, scale: -1, precision: -1 }, max_def_level: 1, max_rep_level:
0, path: ColumnPath { parts: ["time_milli_utc"] } }
Left: ColumnDescriptor { primitive_type: PrimitiveType { basic_info:
BasicTypeInfo { name: "time_milli_utc", repetition: Some(OPTIONAL),
converted_type: TIME_MILLIS, logical_type: Some(Time { is_adjusted_to_u_t_c:
true, unit: MILLIS(MilliSeconds) }), id: N ...
Right: ColumnDescriptor { primitive_type: PrimitiveType { basic_info:
BasicTypeInfo { name: "time_milli_utc", repetition: Some(OPTIONAL),
converted_type: TIME_MILLIS, logical_type: Some(Time { is_adjusted_to_u_t_c:
false, unit: MILLIS(MilliSeconds) }), id: ...
```
##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -1481,7 +1483,11 @@ mod tests {
),
Field::new("date", DataType::Date32, true),
Field::new("time_milli", DataType::Time32(TimeUnit::Millisecond),
true),
+ Field::new("time_milli_utc",
DataType::Time32(TimeUnit::Millisecond), true)
Review Comment:
This test verifies that the metadata contents are correct
https://github.com/apache/arrow-rs/issues/6277 says
> To support writing Time with timezone array to parquet in a similar way to
how we write Timestamp with timezone. Currently, Time arrays are always
converted to parquet with no timezone. (see fn arrow_to_parquet_type)
I suggest it would be a good idea to also add a more "end to end" test that
ensures that a `TimeXXArray` written to parquet is read back as expected.
For example, perhaps follow the model of
https://github.com/apache/arrow-rs/blob/2a4f269e94e155131d6ae0bc8af48101012e3d45/parquet/src/arrow/arrow_reader/mod.rs#L1166-L1224
##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -427,7 +427,7 @@ fn arrow_to_parquet_type(field: &Field) -> Result<Type> {
}
DataType::Time32(unit) => Type::primitive_type_builder(name,
PhysicalType::INT32)
.with_logical_type(Some(LogicalType::Time {
- is_adjusted_to_u_t_c: false,
+ is_adjusted_to_u_t_c:
field.metadata().contains_key("adjusted_to_utc"),
Review Comment:
It seem like this is a different way to set the `is_adjusted_to_u_t_c` field
than how it is done for `DataType::Timestamp`. I wonder if this is intentional?
```rust
DataType::Timestamp(time_unit, tz) => {
Type::primitive_type_builder(name, PhysicalType::INT64)
.with_logical_type(Some(LogicalType::Timestamp {
// If timezone set, values are normalized to UTC timezone
is_adjusted_to_u_t_c: matches!(tz, Some(z) if
!z.as_ref().is_empty()),
unit: match time_unit {
TimeUnit::Second => unreachable!(),
TimeUnit::Millisecond =>
ParquetTimeUnit::MILLIS(Default::default()),
TimeUnit::Microsecond =>
ParquetTimeUnit::MICROS(Default::default()),
TimeUnit::Nanosecond =>
ParquetTimeUnit::NANOS(Default::default()),
},
}))
.with_repetition(repetition)
.with_id(id)
.build()
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]