[jira] [Commented] (ARROW-11455) [R] Improve handling of -2^31 in 32-bit integer fields

Ben Kietzman (Jira) Thu, 03 Jun 2021 18:38:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357003#comment-17357003
 ]


Ben Kietzman commented on ARROW-11455:
--------------------------------------

Instead of using IntegersCanFit, just use CheckIntegersInRange:

{code}
--- a/r/src/array_to_vector.cpp
+++ b/r/src/array_to_vector.cpp
@@ -34,7 +34,7 @@
 namespace arrow {

 using internal::checked_cast;
-using internal::IntegersCanFit;
+using internal::CheckIntegersInRange;

 namespace r {

@@ -1045,14 +1045,17 @@ class Converter_Null : public Converter {
 };

 bool ArraysCanFitInteger(ArrayVector arrays) {
-  bool all_can_fit = true;
-  auto i32 = arrow::int32();
-  for (const auto& array : arrays) {
-    if (all_can_fit) {
-      all_can_fit = arrow::IntegersCanFit(arrow::Datum(array), *i32).ok();
-    }
+  // don't include NA_integer_ as a usable int32
+  static arrow::Int32Scalar min_i32(std::numeric_limits<int32_t>::min() + 1);
+  static arrow::Int32Scalar max_i32(std::numeric_limits<int32_t>::max());
+
+  for (arrow::Datum array : arrays) {
+    auto min = ValueOrStop(min_i32.CastTo(array.type()));
+    auto max = ValueOrStop(min_i32.CastTo(array.type()));
+    bool fits = arrow::CheckIntegersInRange(array, *min, *max).ok();
+    if (!fits) return false;
   }
-  return all_can_fit;
+  return true;
 }

 bool GetBoolOption(const std::string& name, bool default_) {
{code}

Note that we'll need to check ArraysCanFitInteger for int32 as well as uint32, 
int64, uint64

> [R] Improve handling of -2^31 in 32-bit integer fields
> ------------------------------------------------------
>
>                 Key: ARROW-11455
>                 URL: https://issues.apache.org/jira/browse/ARROW-11455
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 3.0.0
>            Reporter: Ian Cook
>            Priority: Major
>
> R’s {{integer}} range is 1 smaller than the normal 32-bit integer range of 
> C++, Java, etc. In R, it’s {{-2^31 + 1}} to {{2^31 - 1}}. Elsewhere, it’s 
> {{-2^31}} to {{2^31 - 1}}. So R's native {{integer}} type cannot represent 
> {{-2^31}} ({{-2147483648}}). I believe this is because R uses {{-2^31}} as 
> the sentinel value to represent {{NA_integer_}}.
> If you run {{-2147483648L}} in R, it converts the vector it to {{numeric}} 
> type and issues a warning:
> {code:java}
> Warning message:
> non-integer value 2147483648L qualified with L; using numeric value 
> {code}
> In the {{arrow}} R package, when a 32-bit integer Arrow field containing the 
> value {{-2147483648}} is converted to an R {{integer}} vector, that value 
> becomes {{NA_integer_}}. No warning is given.
> Simple command to repro this: 
> {code:r}
> as.vector(Scalar$create(-2^31, int32())){code}
> Consider whether we should handle this case differently and whether it is 
> feasible to do so without performance regressions. Other possible behaviors 
> might be:
>  * Converting the value to {{NA_integer_}} with a warning
>  * Converting the field to {{bit64::integer64}} with a warning
>  * Converting the field to {{base::numeric}} with a warning
>  * Allowing the user to specify an argument or option to control the behavior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11455) [R] Improve handling of -2^31 in 32-bit integer fields

Reply via email to