scovich commented on PR #10157: URL: https://github.com/apache/arrow-rs/pull/10157#issuecomment-4855216745
> Sorry for the late reply, a little busy these days on internal work. > > > Value-based lossless (e.g. i64 -> i8 may succeed, for values in -128..128) > > @scovich For the second type, there is some case may be trick: for float64 -> float32, not all float64 in `[float32::MIN, float32::MAX]` can convert to float32 loselesse, this makes the semantics tricky, such as `3f64` and `4f64` can convert to `f32` losslesse, but `f64::PI` can't convert to `f32` lossless, this makes the user a little difficult(not as ease as i64 -> i8) to expect which f64 will convert to f32, this is the reason why I perfer to `identify` API for `as_xxx` API > > If the value-based lossless conversion for `f64` -> `f32` is receivable for us, then we can implement the `Variant::as_xxx` using the second approach(value-based lossless conversion). I'm not worried about supporting f64 -> f32 conversions. Almost all values will suffer a lossy conversion (as you point out). Also, I don't know of (m)any JSON parsers that choose f32 when they see a non-integral number. They all choose f64. I _do_ worry about _not_ supporting i8 -> i64 conversions because many parsers -- including our own arrow-rs json -> parquet parser, will choose the narrowest type that can represent a given number. So a column with values `[1, 1000, 100_000, 10_000_000_000, 100_000_000_000_000_000_000]` would variant-parse as `[i8, i16, i32, i64, f64]` (we don't support parsing decimal values from json yet). And the problem can't be fixed by e.g. `cast` to i64, because that would truncate (or fail) the f64 value that doesn't fit. The whole point of variant is to accommodate "wrong" type values, even when shredding. Similarly, I worry a little (tho less) about supporting i64 -> i8 conversions because somebody may want to shred a column as a narrow type, knowing that most values are small and that any outliers will stay behind in the `value` column. Again, `cast` to i8 wouldn't work because it would truncate/fail the outliers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
