jecsand838 commented on code in PR #8433:
URL: https://github.com/apache/arrow-rs/pull/8433#discussion_r2376947171
##########
arrow-avro/src/codec.rs:
##########
@@ -685,6 +686,8 @@ pub enum Codec {
Interval,
/// Represents Avro union type, maps to Arrow's Union data type
Union(Arc<[AvroDataType]>, UnionFields, UnionMode),
+ /// Represents an Avro long with an `arrowDurationUnit` metadata property.
Maps to Arrow's Duration(TimeUnit) data type.
+ Duration(TimeUnit),
Review Comment:
So I thought about this some more and I think the cleanest way to do this is
to create the following custom logical types which annotate a `long` primitive
type:
1. `arrow.duration-nanos`
2. `arrow.duration-micros`
3. `arrow.duration-millis`
4. `arrow.duration-seconds`
I'd also probably add a new feature flag called `avro_custom_types` or
something along those lines. **When the flag is toggled on,** then we'd use the
logical types and map to/from `DataType::Duration`. When **the flag is off**,
we'd simply read/write the value using it's primitive, i.e. the `long`.
```suggestion
#[cfg(feature = "avro_custom_types")]
DurationNano,
#[cfg(feature = "avro_custom_types")]
DurationMilcros,
#[cfg(feature = "avro_custom_types")]
DurationMillis,
#[cfg(feature = "avro_custom_types")]
DurationSeconds,
```
Not sure the ideal default setting (on or off) for this. However the custom
logical types should **not** cause errors with other Avro decoders as they
should just default to the underlying primitive, i.e. the `long` when presented
with an unknown logical type.
@alamb @scovich I'm curious what you all think about this approach? The
problem stems from Avro's lack of support for an elapsed time type. `Duration`
in Avro is for Calendar time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]