felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1626598031
########## cpp/src/arrow/compute/kernels/gather_internal.h: ########## @@ -0,0 +1,286 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include <cassert> +#include <cstddef> +#include <cstdint> +#include "arrow/array/data.h" +#include "arrow/util/bit_block_counter.h" +#include "arrow/util/bit_run_reader.h" +#include "arrow/util/bit_util.h" +#include "arrow/util/bitmap_ops.h" +#include "arrow/util/macros.h" + +// Implementation helpers for kernels that need to load/gather fixed-width +// data from multiple, arbitrary indices. +// +// https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing) + +namespace arrow::internal { +inline namespace gather_internal { +// CRTP [1] base class for Gather that provides a gathering loop in terms of +// Write*() methods that must be implemented by the derived class. +// +// [1] https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern +template <class GatherImpl> +class GatherBaseCRTP { + public: + // Output offset is not supported by Gather and idx is supposed to have offset + // pre-applied. idx_validity parameters on functions can use the offset they + // carry to read the validity bitmap as bitmaps can't have pre-applied offsets + // (they might not align to byte boundaries). + GatherBaseCRTP() = default; + GatherBaseCRTP(const GatherBaseCRTP&) = delete; + GatherBaseCRTP(GatherBaseCRTP&&) = delete; + GatherBaseCRTP& operator=(const GatherBaseCRTP&) = delete; + GatherBaseCRTP& operator=(GatherBaseCRTP&&) = delete; + + protected: + ARROW_FORCE_INLINE int64_t ExecuteNoNulls(int64_t idx_length) { + auto* self = static_cast<GatherImpl*>(this); + for (int64_t position = 0; position < idx_length; position++) { + self->WriteValue(position); + } + return idx_length; + } + + // See derived Gather classes below for the meaning of the parameters, pre and + // post-conditions. + template <bool kOutputIsZeroInitialized, typename IndexCType> + ARROW_FORCE_INLINE int64_t ExecuteWithNulls(const ArraySpan& src_validity, Review Comment: Yes. I checked code and benchmarks (a month or more ago). Here's the overall approach with this header. `GatherBaseCRTP` is a template that generates these templates: ```cpp template <typename Global, typename Params> class Gather { a simple; b scalar; c members; void WriteValue(pos) { ... } void WriteZero(pos) { ... } void WriteZeroSegment(pos, len) { ... } int64_t ExecuteNoNulls(len) { .. } template<bool OutputIsZeroInitialized> int64_t Execute(src_validity, idx_validity, out_is_valid) { ... } }; ``` The usage should be as follows. On the same function, the specific `Gather` instance should be instantiated and the method should be called on it: ```cpp const bool out_has_validity = values.MayHaveNulls() || indices.MayHaveNulls(); const uint8_t* src; int64_t src_offset; std::tie(src_offset, src) = util::OffsetPointerOfFixedBitWidthValues(values); uint8_t* out = util::MutableFixedWidthValuesPointer(out_arr); int64_t valid_count = 0; arrow::internal::Gather<kValueWidthInBits, IndexCType, WithFactor::value> gather{ /*src_length=*/values.length, src, src_offset, /*idx_length=*/indices.length, /*idx=*/indices.GetValues<IndexCType>(1), out, factor}; if (out_has_validity) { DCHECK_EQ(out_arr->offset, 0); // out_is_valid must be zero-initiliazed, because Gather::Execute // saves time by not having to ClearBit on every null element. auto out_is_valid = out_arr->GetMutableValues<uint8_t>(0); memset(out_is_valid, 0, bit_util::BytesForBits(out_arr->length)); valid_count = gather.template Execute<OutputIsZeroInitialized::value>( /*src_validity=*/values, /*idx_validity=*/indices, out_is_valid); } else { valid_count = gather.Execute(); } ``` What this achieves? `Gather<kValueWidthInBits, ...> gather{...}` isn't really the initialization of a portion of the memory in the stack, but of registers. Because the lifetime of `gather` is confined to a single function and all the method calls on it are inlined, the member of `gather` can live in registers from initialization to the end of the kernel loop. I did this to achieve the same performance we had when these loops were hand-written (more than once!) in `vector_selection_take_internal.cc`. This idea of keeping variables in registers is mentioned in database implementation reports ([Efficiently Compiling Efficient Query Plans for Modern Hardware](https://www.vldb.org/pvldb/vol4/p539-neumann.pdf)) and it's not just compiler-brain perfectionism (https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates). By force-inlining the `Execute` methods I'm actually reducing code size (together with the fact that I instantiate `Gather` and these specific `Execute` calls from a single place). Without inlining `Execute` there would be more binary code to deal with construction/destruction, loads to/from memory. To make sure we only ever instantiate these templates once, **I could wrap this code in templates outside the inline namespace.** The templates would contain the initialization of `Gather<...> gather` and the inlined calls to the two inlined `Execute` functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
