zanmato1984 commented on code in PR #48166:
URL: https://github.com/apache/arrow/pull/48166#discussion_r2557768858


##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,12 +307,33 @@ class HashJoinBasicImpl : public HashJoinImpl {
 
     size_t num_probed_rows = match.size() + no_match.size();
     if (mask.is_scalar()) {
-      const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
-      if (mask_scalar.is_valid && mask_scalar.value) {
-        // All rows passed, nothing left to do
-        return Status::OK();
+#if ARROW_LITTLE_ENDIAN
+       const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
+       if (mask_scalar.is_valid && mask_scalar.value) {
+         // All rows passed, nothing left to do
+         return Status::OK();
+#else
+      // Check if the scalar is a BooleanScalar before casting
+      if (mask.scalar()->type->id() == Type::BOOL) {

Review Comment:
   This is a very detailed explanation. Thanks @Vishwanatha-HD .
   
   As the author of the questioning test, I can say this is the issue of the 
test rather than of the hash join code. That is, using an filter expression 
evaluating to null-type `null` is invalid - it should be a boolean `null`. The 
hash join code itself arbitrarily assuming the expression being `boolean` is 
OK, though a `DCHECK_EQ(mask.type()->id(), Type::BOOL)` would be more 
preferable.
   
   I think we should in turn fix the test by simply replacing the 
`literal(NullScalar())` with a boolean `null` - 
`literal(MakeNullScalar(boolean()))`



##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,12 +307,33 @@ class HashJoinBasicImpl : public HashJoinImpl {
 
     size_t num_probed_rows = match.size() + no_match.size();
     if (mask.is_scalar()) {
-      const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
-      if (mask_scalar.is_valid && mask_scalar.value) {
-        // All rows passed, nothing left to do
-        return Status::OK();
+#if ARROW_LITTLE_ENDIAN
+       const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
+       if (mask_scalar.is_valid && mask_scalar.value) {
+         // All rows passed, nothing left to do
+         return Status::OK();
+#else
+      // Check if the scalar is a BooleanScalar before casting
+      if (mask.scalar()->type->id() == Type::BOOL) {

Review Comment:
   But one question remains: The test should be equally problematic for 
little-endian as well. Why is it passing?
   
   I'm now looking into it.



##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,19 +307,40 @@ class HashJoinBasicImpl : public HashJoinImpl {
 
     size_t num_probed_rows = match.size() + no_match.size();
     if (mask.is_scalar()) {
+#if ARROW_LITTLE_ENDIAN
       const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
       if (mask_scalar.is_valid && mask_scalar.value) {
         // All rows passed, nothing left to do
         return Status::OK();
-      } else {
-        // Nothing passed, no_match becomes everything
-        no_match.resize(num_probed_rows);
-        std::iota(no_match.begin(), no_match.end(), 0);
-        match_left.clear();
-        match_right.clear();
-        match.clear();
-        return Status::OK();
       }
+#else
+      // Check if the scalar is a BooleanScalar before casting

Review Comment:
   Explained in my other comment down below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to