lidavidm commented on a change in pull request #10987:
URL: https://github.com/apache/arrow/pull/10987#discussion_r696610595
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -1827,9 +1827,131 @@ Status ExecArrayCoalesce(KernelContext* ctx, const
ExecBatch& batch, Datum* out)
return Status::OK();
}
+// Special case: implement 'coalesce' for an array and a scalar for any
+// fixed-width type (a 'fill_null' operation)
+template <typename Type>
+Status ExecArrayScalarCoalesce(KernelContext* ctx, Datum left, Datum right,
+ int64_t length, Datum* out) {
+ ArrayData* output = out->mutable_array();
+ const int64_t out_offset = output->offset;
+ uint8_t* out_valid = output->buffers[0]->mutable_data();
+ uint8_t* out_values = output->buffers[1]->mutable_data();
+
+ const ArrayData& left_arr = *left.array();
+ const uint8_t* left_valid =
+ left_arr.MayHaveNulls() ? left_arr.buffers[0]->data() : nullptr;
+ arrow::internal::OptionalBitBlockCounter bit_counter(left_valid,
left_arr.offset,
Review comment:
Just for completeness, I tested 50% and 99% nulls too. BitRunReader is
still faster overall, maybe a bit slower with 50% nulls with strings.
```
--------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
--------------------------------------------------------------------------------------
BitBlockCounter:
CoalesceScalarBench64/0 1488163 ns 1488088 ns 457
bytes_per_second=5.25003G/s items_per_second=704.647M/s length=1048.58k null%=1
num_args=2
CoalesceScalarBench64/2 5307719 ns 5307520 ns 133
bytes_per_second=1.47197G/s items_per_second=197.564M/s length=1048.58k
null%=25 num_args=2
CoalesceScalarBench64/4 8338070 ns 8337713 ns 83
bytes_per_second=959.496M/s items_per_second=125.763M/s length=1048.58k
null%=50 num_args=2
CoalesceScalarBench64/6 2187502 ns 2187409 ns 323
bytes_per_second=3.57158G/s items_per_second=479.369M/s length=1048.58k
null%=99 num_args=2
CoalesceScalarStringBench/0 112844773 ns 112838638 ns 5
bytes_per_second=4.41803G/s items_per_second=9.2927M/s length=1048.58k null%=1
num_args=2
CoalesceScalarStringBench/2 94380710 ns 94376399 ns 8
bytes_per_second=4.01386G/s items_per_second=11.1106M/s length=1048.58k
null%=25 num_args=2
CoalesceScalarStringBench/4 54766774 ns 54766616 ns 10
bytes_per_second=4.64332G/s items_per_second=19.1463M/s length=1048.58k
null%=50 num_args=2
CoalesceScalarStringBench/6 6268787 ns 6268586 ns 108
bytes_per_second=1.41205G/s items_per_second=167.275M/s length=1048.58k
null%=99 num_args=2
BitRunReader:
CoalesceScalarBench64/0 994000 ns 994007 ns 717
bytes_per_second=7.8596G/s items_per_second=1054.9M/s length=1048.58k null%=1
num_args=2
CoalesceScalarBench64/2 4624991 ns 4625012 ns 153
bytes_per_second=1.68918G/s items_per_second=226.719M/s length=1048.58k
null%=25 num_args=2
CoalesceScalarBench64/4 7016190 ns 7016189 ns 102
bytes_per_second=1.1135G/s items_per_second=149.451M/s length=1048.58k null%=50
num_args=2
CoalesceScalarBench64/6 1013352 ns 1013358 ns 672
bytes_per_second=7.70952G/s items_per_second=1034.75M/s length=1048.58k
null%=99 num_args=2
CoalesceScalarStringBench/0 110433694 ns 110433042 ns 5
bytes_per_second=4.51427G/s items_per_second=9.49513M/s length=1048.58k null%=1
num_args=2
CoalesceScalarStringBench/2 79904343 ns 79904642 ns 7
bytes_per_second=4.74082G/s items_per_second=13.1228M/s length=1048.58k
null%=25 num_args=2
CoalesceScalarStringBench/4 61812967 ns 61813273 ns 10
bytes_per_second=4.11398G/s items_per_second=16.9636M/s length=1048.58k
null%=50 num_args=2
CoalesceScalarStringBench/6 6219633 ns 6219571 ns 105
bytes_per_second=1.42317G/s items_per_second=168.593M/s length=1048.58k
null%=99 num_args=2
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]