This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git
The following commit(s) were added to refs/heads/master by this push:
new 34bfc42 [Bug] Fix row_number and group by are inconsistent with 0 and
-0 partition (#5226)
34bfc42 is described below
commit 34bfc429868a9a22481d209c24ccd50d85cc3c9f
Author: Xinyi Zou <[email protected]>
AuthorDate: Sat Jan 23 21:08:43 2021 +0800
[Bug] Fix row_number and group by are inconsistent with 0 and -0 partition
(#5226)
The essence of the problem is behavior of negative zero (- 0.0) in
comparison with positive zero (+ 0.0).
Currently in GroupBy and HashPartition, -0.0 is not equal to 0.0 (result of
Hash function),
so the -0.0 and 0.0 are divided into 2 partitions.
In row_number analytic function, for the sorted data, a new partition will
be opened when the values โโโโof
the upper and lower rows are not equal. But in C++ the comparison 0.0 ==
-0.0 is true, so 0.0 and -0.0
are divided into the same partition for row_number.
(Floating point arithmetic in C++ is often IEEE-754. This norm defines two
different representations for
the value zero: positive zero and negative zero. It is also defined that
those two representations must
compare equals. Refer to https://stackoverflow.com/questions/45795397)
---
be/src/util/hash_util.hpp | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/be/src/util/hash_util.hpp b/be/src/util/hash_util.hpp
index ccbc79b..6f0087e 100644
--- a/be/src/util/hash_util.hpp
+++ b/be/src/util/hash_util.hpp
@@ -29,6 +29,7 @@
#include <nmmintrin.h>
#endif
#include <zlib.h>
+#include <cmath>
#include "util/cpu_info.h"
#include "util/murmur_hash3.h"
#include "gen_cpp/Types_types.h"
@@ -58,6 +59,10 @@ public:
uint32_t words = bytes / sizeof(uint32_t);
bytes = bytes % sizeof(uint32_t);
+ // When data is negative zero, rewrite it to 0.0, see Doris#5226
+ if (*(double*)data == 0.0 and std::signbit(*(double*)data) == true) {
+ *(double*)data = 0.0;
+ }
const uint32_t* p = reinterpret_cast<const uint32_t*>(data);
while (words--) {
@@ -219,6 +224,13 @@ public:
// is taken on the hash, all values will collide to the same bucket.
// For string values, Fnv is slightly faster than boost.
static uint32_t fnv_hash(const void* data, int32_t bytes, uint32_t hash) {
+ // When data is negative zero, rewrite it to 0.0, which will be used
in
+ // DataStreamSender::send HashPartition, because the original value of
+ // data is covered here, so -0.0 in the data after DataStreamSender
+ // will be changed to 0.0, see Doris#5226
+ if (*(double*)data == 0.0 and std::signbit(*(double*)data) == true) {
+ *(double*)data = 0.0;
+ }
const uint8_t* ptr = reinterpret_cast<const uint8_t*>(data);
while (bytes--) {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]