klion26 commented on issue #8086: URL: https://github.com/apache/arrow-rs/issues/8086#issuecomment-3652669270
This comment is a backup from #8873, as #8873 may close when the unification has been implemented. 1. We can unify the usage in `parquet-variant-compute` into `arrow-cast/CastOptions`(`arrow-cast/src/cast/mod.rs#CastOptions`). 2. We can add a `lossy` flag to `arrow-cast/CastOptions` to handle `Decimal` -> `Int` and `Timestamp(Micro)` -> `Timestamp(Second)` cases. The specific behavior is shown below: > There are only 3 cases ` (lossy, *)`, `(lossless, safe)`, `(lossless, unsafe)` in the below table | safe | lossy | 1234(Micro) -> Mills) | | -- | -- | -- | | true | true | 1 (= floor(1234 / 1_000)) | | true | false | Null | | false | true | 1 | | false | false | Err | | safe | lossy | 1000(Micro) -> Mills | | -- | -- | -- | | true | true | 1 | | true | false | 1 | | false | true | 1 | | false | false | 1 | The specific steps are as follows: 1. Add a lossy flag to `arrow-cast/CastOptions`. 2. Unify `CastOptions` usage in `parquet-variant-compute`. 2.1 Need to change the behavior for those that can't be cast safely. Currently, the default behavior of `arrow-cast/CastOptions` and `parquet-compute-variant/CastOptions` is different; the default value for [`arrow-cast/CastOptions` is `safe=true`](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/arrow-cast/src/cast/mod.rs#L83-L90), and [`parquet-variant-compute/CastOptions` is `strict=true`(which is that `safe=false`)](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/parquet-variant-compute/src/type_conversion.rs#L35-L39) 3. After adding `lossy` in `arrow-cast/CastOptions`, we can update parquet-variant-compute to adapt the corresponding logic. Steps 1 and 2 do not depend on each other, and can push forward in parallel Steps 1 and 2 will break the Public API; we need to do them in a major version - Step 1 will add a new field in pub struct - Step 2 will change the parameter and default behavior for some functions Currently, the `CastOptions` usages in `parquet-variant-compute` are as follows: - `shred_variant.rs/variant_to_arrow.rs/variant_get.rs` use `arrow-cast/CastOptions.` - `arrow_to_variant.rs/cast_to_variant.rs/lib.rs(re-export)` use `parquet-variant-compute/CastOptions` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
