scovich commented on PR #10157:
URL: https://github.com/apache/arrow-rs/pull/10157#issuecomment-4855216745

   > Sorry for the late reply, a little busy these days on internal work.
   > 
   > > Value-based lossless (e.g. i64 -> i8 may succeed, for values in 
-128..128)
   > 
   > @scovich For the second type, there is some case may be trick: for float64 
-> float32, not all float64 in `[float32::MIN, float32::MAX]` can convert to 
float32 loselesse, this makes the semantics tricky, such as `3f64` and `4f64` 
can convert to `f32` losslesse, but `f64::PI` can't convert to `f32` lossless, 
this makes the user a little difficult(not as ease as i64 -> i8) to expect 
which f64 will convert to f32, this is the reason why I perfer to `identify` 
API for `as_xxx` API
   > 
   > If the value-based lossless conversion for `f64` -> `f32` is receivable 
for us, then we can implement the `Variant::as_xxx` using the second 
approach(value-based lossless conversion).
   
   I'm not worried about supporting f64 -> f32 conversions. Almost all values 
will suffer a lossy conversion (as you point out). Also, I don't know of (m)any 
JSON parsers that choose f32 when they see a non-integral number. They all 
choose f64.
   
   I _do_ worry about _not_ supporting i8 -> i64 conversions because many 
parsers -- including our own arrow-rs json -> parquet parser, will choose the 
narrowest type that can represent a given number. So a column with values `[1, 
1000, 100_000, 10_000_000_000, 100_000_000_000_000_000_000]` would 
variant-parse as `[i8, i16, i32, i64, f64]` (we don't support parsing decimal 
values from json yet). And the problem can't be fixed by e.g. `cast` to i64, 
because that would truncate (or fail) the f64 value that doesn't fit. The whole 
point of variant is to accommodate "wrong" type values, even when shredding.
   
   Similarly, I worry a little (tho less) about supporting i64 -> i8 
conversions because somebody may want to shred a column as a narrow type, 
knowing that most values are small and that any outliers will stay behind in 
the `value` column. Again, `cast` to i8 wouldn't work because it would 
truncate/fail the outliers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to