This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 34bfc42  [Bug] Fix row_number and group by are inconsistent with 0 and 
-0 partition (#5226)
34bfc42 is described below

commit 34bfc429868a9a22481d209c24ccd50d85cc3c9f
Author: Xinyi Zou <[email protected]>
AuthorDate: Sat Jan 23 21:08:43 2021 +0800

    [Bug] Fix row_number and group by are inconsistent with 0 and -0 partition 
(#5226)
    
    The essence of the problem is behavior of negative zero (- 0.0) in 
comparison with positive zero (+ 0.0).
    Currently in GroupBy and HashPartition, -0.0 is not equal to 0.0 (result of 
Hash function),
    so the -0.0 and 0.0 are divided into 2 partitions.
    
    In row_number analytic function, for the sorted data, a new partition will 
be opened when the values โ€‹โ€‹โ€‹โ€‹of
    the upper and lower rows are not equal. But in C++ the comparison 0.0 == 
-0.0 is true, so 0.0 and -0.0
    are divided into the same partition for row_number.
    
    (Floating point arithmetic in C++ is often IEEE-754. This norm defines two 
different representations for
    the value zero: positive zero and negative zero. It is also defined that 
those two representations must
    compare equals. Refer to https://stackoverflow.com/questions/45795397)
---
 be/src/util/hash_util.hpp | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/be/src/util/hash_util.hpp b/be/src/util/hash_util.hpp
index ccbc79b..6f0087e 100644
--- a/be/src/util/hash_util.hpp
+++ b/be/src/util/hash_util.hpp
@@ -29,6 +29,7 @@
 #include <nmmintrin.h>
 #endif
 #include <zlib.h>
+#include <cmath>
 #include "util/cpu_info.h"
 #include "util/murmur_hash3.h"
 #include "gen_cpp/Types_types.h"
@@ -58,6 +59,10 @@ public:
         uint32_t words = bytes / sizeof(uint32_t);
         bytes = bytes % sizeof(uint32_t);
 
+        // When data is negative zero, rewrite it to 0.0, see Doris#5226
+        if (*(double*)data == 0.0 and std::signbit(*(double*)data) == true) {
+            *(double*)data = 0.0;
+        }
         const uint32_t* p = reinterpret_cast<const uint32_t*>(data);
 
         while (words--) {
@@ -219,6 +224,13 @@ public:
     // is taken on the hash, all values will collide to the same bucket.
     // For string values, Fnv is slightly faster than boost.
     static uint32_t fnv_hash(const void* data, int32_t bytes, uint32_t hash) {
+        // When data is negative zero, rewrite it to 0.0, which will be used 
in 
+        // DataStreamSender::send HashPartition, because the original value of 
+        // data is covered here, so -0.0 in the data after DataStreamSender 
+        // will be changed to 0.0, see Doris#5226
+        if (*(double*)data == 0.0 and std::signbit(*(double*)data) == true) {
+            *(double*)data = 0.0;
+        }
         const uint8_t* ptr = reinterpret_cast<const uint8_t*>(data);
 
         while (bytes--) {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to