zhengshengjun commented on code in PR #10392:
URL: https://github.com/apache/doris/pull/10392#discussion_r917411644
##########
be/src/vec/aggregate_functions/aggregate_function_null.h:
##########
@@ -197,9 +200,66 @@ class AggregateFunctionNullUnary final
}
}
+ void add_not_nullable(AggregateDataPtr __restrict place, const IColumn**
columns,
+ size_t row_num, Arena* arena) {
+ const ColumnNullable* column = assert_cast<const
ColumnNullable*>(columns[0]);
+ this->set_flag(place);
+ const IColumn* nested_column = &column->get_nested_column();
+ this->nested_function->add(this->nested_place(place), &nested_column,
row_num, arena);
+ }
+
+ void add_batch(size_t batch_size, AggregateDataPtr* places, size_t
place_offset,
+ const IColumn** columns, Arena* arena) const override {
+ int processed_records_num = 0;
+
+ // we can use column->has_null() to judge whether whole batch of data
is null and skip batch,
+ // but it's maybe too coarse-grained.
+#ifdef __AVX2__
+ const ColumnNullable* column = assert_cast<const
ColumnNullable*>(columns[0]);
+ // The overhead introduced is negligible here, just an extra memory
read from NullMap
+ const NullMap& null_map_data = column->get_null_map_data();
+
+ // NullMap use uint8_t type to indicate values is null or not, 1
indicates null, 0 versus.
+ // It's important to keep consistent with element type size in NullMap
+ constexpr int simd_batch_size = 256 / (8 * sizeof(uint8_t));
+ __m256i all0 = _mm256_setzero_si256();
+ auto to_read_null_map_position = reinterpret_cast<const
int8_t*>(null_map_data.data());
+
+ while (processed_records_num + simd_batch_size < batch_size) {
+ to_read_null_map_position = to_read_null_map_position +
processed_records_num;
+ // load unaligned data from null_map, 1 means value is null, 0
versus
+ __m256i f =
+ _mm256_loadu_si256(reinterpret_cast<const
__m256i*>(to_read_null_map_position));
+ int mask = _mm256_movemask_epi8(_mm256_cmpgt_epi8(f, all0));
+ // all data is null
+ if (mask == 0xffffffff) {
+ } else if (mask == 0) { // all data is not null
+ for (size_t i = processed_records_num; i <
processed_records_num + simd_batch_size;
+ i++) {
+ add_not_nullable(places[i] + place_offset, columns, i,
arena);
+ }
+ } else {
+ // data is partly null
+ for (size_t i = processed_records_num; i <
processed_records_num + simd_batch_size;
Review Comment:
hi @BiteTheDDDDt , thanks for your advise. The simd_batch_size here is 64,
which is fine-grained enough I think. And finding not null offset is an
expensive operation, which may incur performance downgrade if the data is not
sparse? I think we'd better do this after collecting some statistics in
NullableColumn in the future ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]