paleolimbot commented on issue #40109: URL: https://github.com/apache/arrow/issues/40109#issuecomment-1972314348
It looks like the actual truncation happens here: https://github.com/apache/arrow/blob/214378b522a36fbf6010e3d4f5470abaca7bf92e/r/src/r_to_arrow.cpp#L926 The cast to the `c_type` as David noted, is a cast to an int64. On this line, one could check that you're not doing any truncation (I think you can use `std::modf()` for that). You would probably have to do something like count the number of lossy casts (e.g., `this->n_lossy_casts_++`) and issue the warning at the very end of the conversion. Perhaps the underlying cause is that we infer the unit of "seconds" by default. We could infer "milliseconds" or "microseconds" which would avoid truncation (or would limit it to thousandths or millionths of a second). I don't know why "seconds" is the default but a good fix for this might be to change it to "ms" or "us" (or add an `options()` to do so, perhaps migrating to a safer default over several versions with some warnings). ``` r arrow::infer_type(as.difftime(double(), units = "secs")) #> DurationType #> duration[s] ``` A workaround could be to specify the type explicitly: ``` r delta <- as.difftime(c(0.000, 0.001, 0.002, 1, 1.5), units = "secs") delta |> arrow::as_arrow_array(type = arrow::duration("ms")) #> Array #> <duration[ms]> #> [ #> 0, #> 1, #> 2, #> 1000, #> 1500 #> ] ``` It looks like I inferred "microseconds" by default in nanoarrow although I forget the reasoning: ``` r library(nanoarrow) delta <- as.difftime(c(0.000, 0.001, 0.002, 1, 1.5), units = "secs") delta |> as_nanoarrow_array() |> arrow::as_arrow_array() #> Array #> <duration[us]> #> [ #> 0, #> 1000, #> 2000, #> 1000000, #> 1500000 #> ] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
