klion26 commented on issue #8873:
URL: https://github.com/apache/arrow-rs/issues/8873#issuecomment-3630708708

   @alamb @scovich   After some analysis, my preliminary conclusions for this 
issue and #8086 are as follows:
   
   1. We can unify the usage in `parquet-variant-compute` into 
`arrow-cast/CastOptions`(`arrow-cast/src/cast/mod.rs#CastOptions`).
   2. We can add a `lossy` flag to `arrow-cast/CastOptions` to handle `Decimal` 
-> `Int` and `Timestamp(Micro)` -> `Timestamp(Second)` cases. The specific 
behavior is shown below:
   
   | safe | lossy | 1234(Micro) -> Mills) |
   | -- | -- | -- |
   | true | true  |  1  (= floor(1234 / 1_000)) |  
   | true | false |  Null |
   |  false | true | 1 |
   | false | false | Err |
   
   | safe | lossy | 1000(Micro) -> Mills |
   | -- | -- | -- |
   |  true | true | 1 |
   |  true | false | 1 |
   |  false | true | 1 |
   |  false | false | 1 |
   
   
   The specific steps are as follows:
   1. Add a lossy flag to `arrow-cast/CastOptions`.
   2. Unify `CastOptions` usage in `parquet-variant-compute`.
   2.1 Need to change the behavior for those that can't be cast safely. 
Currently, the default behavior of `arrow-cast/CastOptions` and 
`parquet-compute-variant/CastOptions` is different; the default value for 
[`arrow-cast/CastOptions` is 
`safe=true`](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/arrow-cast/src/cast/mod.rs#L83-L90),
 and [`parquet-variant-compute/CastOptions` is `strict=true`(which is that 
`safe=false`)](https://github.com/apache/arrow-rs/blob/c9fca0b1248911daeab5e92146d33a58f11e15d2/parquet-variant-compute/src/type_conversion.rs#L35-L39)
   3. After adding  `lossy` in `arrow-cast/CastOptions`, we can update 
parquet-variant-compute to adapt the corresponding logic.
   
   Steps 1 and 2 do not depend on each other, and can push forward in parallel
   
   Steps 1 and 2 will break the Public API; we need to do them in a major 
version
   - Step 1 will add a new field in pub struct
   - Step 2 will change the parameter and default behavior for some functions
   
   
   Currently, the `CastOptions` usages in `parquet-variant-compute` are as 
follows:
   - `shred_variant.rs/variant_to_arrow.rs/variant_get.rs` use 
`arrow-cast/CastOptions.`
   - `arrow_to_variant.rs/cast_to_variant.rs/lib.rs(re-export)` use 
`parquet-variant-compute/CastOptions`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to