[jira] [Commented] (ARROW-2162) [Python/C++] Decimal Values with too-high precision are multiplied by 100

ASF GitHub Bot (JIRA) Mon, 19 Feb 2018 13:57:15 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369560#comment-16369560
 ]


ASF GitHub Bot commented on ARROW-2162:
---------------------------------------

cpcloud commented on a change in pull request #1619: ARROW-2162: [Python/C++] 
Decimal Values with too-high precision are multiplied by 100
URL: https://github.com/apache/arrow/pull/1619#discussion_r169178775
 
 

 ##########
 File path: cpp/src/arrow/util/decimal.cc
 ##########
 @@ -854,22 +854,56 @@ static const Decimal128 ScaleMultipliers[] = {
     Decimal128("10000000000000000000000000000000000000"),
     Decimal128("100000000000000000000000000000000000000")};
 
+static bool RescaleWouldCauseDataLoss(const Decimal128& value, int32_t 
delta_scale,
+                                      int32_t abs_delta_scale) {
+  if (delta_scale < 0) {
+    Decimal128 remainder;
+    Decimal128 result;
+    Decimal128 multiplier(ScaleMultipliers[abs_delta_scale]);
+    DCHECK_NE(multiplier, 0);
+    Status status = value.Divide(multiplier, &result, &remainder);
+    DCHECK(status.ok()) << status.message();
+    return remainder != 0;
+  }
+  return false;
+}
+
 Status Decimal128::Rescale(int32_t original_scale, int32_t new_scale,
                            Decimal128* out) const {
   DCHECK_NE(out, NULLPTR);
-  DCHECK_NE(original_scale, new_scale);
-  const int32_t delta_scale = original_scale - new_scale;
+  DCHECK_NE(original_scale, new_scale) << "original_scale == new_scale";
+
+  const int32_t delta_scale = new_scale - original_scale;
   const int32_t abs_delta_scale = std::abs(delta_scale);
+
   DCHECK_GE(abs_delta_scale, 1);
   DCHECK_LE(abs_delta_scale, 38);
 
+  // Fail if rescaling would lose information
+  const bool rescale_would_cause_data_loss =
+      RescaleWouldCauseDataLoss(*this, delta_scale, abs_delta_scale);
+  if (delta_scale < 0 && rescale_would_cause_data_loss) {
+    std::stringstream buf;
+    buf << "Rescaling decimal value from original scale of " << original_scale
+        << " to new scale of " << new_scale << " would truncate the value";
+    return Status::Invalid(buf.str());
+  }
+
   const Decimal128 scale_multiplier = ScaleMultipliers[abs_delta_scale];
-  const Decimal128 result = *this * scale_multiplier;
 
-  if (ARROW_PREDICT_FALSE(result < *this)) {
+  Decimal128 result(*this);
+
+  if (delta_scale < 0) {
+    result /= scale_multiplier;
+  } else {
+    result *= scale_multiplier;
+  }
+
+  // Fail if we overflow
+  if (ARROW_PREDICT_FALSE(result < *this && rescale_would_cause_data_loss)) {
 
 Review comment:
   This isn't handling overflow properly. I'll add a test.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python/C++] Decimal Values with too-high precision are multiplied by 100
> -------------------------------------------------------------------------
>
>                 Key: ARROW-2162
>                 URL: https://issues.apache.org/jira/browse/ARROW-2162
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.8.0
>            Reporter: Phillip Cloud
>            Assignee: Phillip Cloud
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> From GitHub:
> This works as expected:
> {code}
> >>> pyarrow.array([decimal.Decimal('1.23')], pyarrow.decimal128(10,2))[0]
> Decimal('1.23')
> {code}
> Storing an extra digit of precision multiplies the stored value by a factor 
> of 100:
> {code}
> >>> pyarrow.array([decimal.Decimal('1.234')], pyarrow.decimal128(10,2))[0]
> Decimal('123.40')
> {code}
> Ideally I would get an exception since the value I'm trying to store doesn't 
> fit in the declared type of the array. It would be less good, but still ok, 
> if the stored value were 1.23 (truncating the extra digit). I didn't expect 
> pyarrow to silently store a value that differs from the original value by a 
> factor of 100.
> I originally thought that the code was incorrectly multiplying through by an 
> extra factor of 10**scale, but that doesn't seem to be the case. If I change 
> the scale, it always seems to be a factor of 100
> {code}
> >>> pyarrow.array([decimal.Decimal('1.2345')], pyarrow.decimal128(10,3))[0]
> Decimal('123.450')
> I see the same behavior if I use floating point to initialize the array 
> rather than Python's decimal type.
> {code}
> I searched for open github and JIRA for open issues but didn't find anything 
> related to this. I am using pyarrow 0.8.0 on OS X 10.12.6 using python 2.7.14 
> installed via Homebrew



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2162) [Python/C++] Decimal Values with too-high precision are multiplied by 100

Reply via email to