klion26 commented on issue #8873: URL: https://github.com/apache/arrow-rs/issues/8873#issuecomment-3630708708
@alamb @scovich After some analysis, my preliminary conclusions for this issue and #8086 are as follows: 1. We can unify the usage in `parquet-variant-compute` into `arrow-cast/CastOptions`(`arrow-cast/src/cast/mod.rs#CastOptions`). 2. We can add a `lossy` flag to `arrow-cast/CastOptions` to handle `Decimal` -> `Int` and `Timestamp(Micro)` -> `Timestamp(Second)` cases. The specific behavior is shown below: | safe | lossy | 1234(Micro) -> Mills) | | -- | -- | -- | | true | true | 1 (= floor(1234 / 1_000)) | | true | false | Null | | false | true | 1 | | false | false | Err | | safe | lossy | 1000(Micro) -> Mills | | -- | -- | -- | | true | true | 1 | | true | false | 1 | | false | true | 1 | | false | false | 1 | The specific steps are as follows: 1. Add a lossy flag to `arrow-cast/CastOptions`. 2. Unify `CastOptions` usage in `parquet-variant-compute`. 2.1 Need to change the behavior for those that can't be cast safely. Currently, the default behavior of `arrow-cast/CastOptions` and `parquet-compute-variant/CastOptions` is different; the default value for [`arrow-cast/CastOptions` is `safe=true`](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/arrow-cast/src/cast/mod.rs#L83-L90), and [`parquet-variant-compute/CastOptions` is `strict=true`(which is that `safe=false`)](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/parquet-variant-compute/src/type_conversion.rs#L35-L39) 3. After adding `lossy` in `arrow-cast/CastOptions`, we can update parquet-variant-compute to adapt the corresponding logic. Steps 1 and 2 do not depend on each other, and can push forward in parallel Steps 1 and 2 will break the Public API; we need to do them in a major version - Step 1 will add a new field in pub struct - Step 2 will change the parameter and default behavior for some functions Currently, the `CastOptions` usages in `parquet-variant-compute` are as follows: - `shred_variant.rs/variant_to_arrow.rs/variant_get.rs` use `arrow-cast/CastOptions.` - `arrow_to_variant.rs/cast_to_variant.rs/lib.rs(re-export)` use `parquet-variant-compute/CastOptions` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
