On Sat, May 3, 2025 at 3:42 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote:
> This chain discusses changes to `mapping::operator()`. For concrete > discussion, see below. I have a general question: is there a reason > other than style to prefer folds over recursion? > > > On 4/30/25 7:13 AM, Tomasz Kaminski wrote: > > Hi, > > > > As we will be landing patches for extends, this will become a separate > > patch series. > > I would prefer, if you could commit per layout, and start with > layout_right > > (default) > > I try to provide prompt responses, so if that works better for you, you > can > > post a patch > > only with this layout first, as most of the comments will apply to all of > > them. > > > > For the general design we have constructors that allow conversion between > > rank-0 > > and rank-1 layouts left and right. This is done because they essentially > > represents > > the same layout. I think we could benefit from that in code by having a > > base classes > > for rank0 and rank1 mapping: > > template<typename _Extents> > > _Rank0_mapping_base > > { > > static_assert(_Extents::rank() == 0); > > > > template<OtherExtents> > > // explicit, requires goes here > > _Rank0_mapping_base(_Rank0_mapping_base<OtherExtents>); > > > > // All members layout_type goes her > > }; > > > > template<typename _Extents> > > _Rank1_mapping_base > > { > > static_assert(_Extents::rank() == 1); > > // Static assert for product is much simpler here, as we need to > check one > > > > template<OtherExtents> > > // explicit, requires goes here > > _Rank1_mapping_base(_Rank1_mapping_base<OtherExtents>); > > > > // Call operator can also be simplified > > index_type operator()(index_type i) const // conversion happens at > user > > side > > > > // cosntructor from strided_layout of Rank1 goes here. > > > > // All members layout_type goes her > > }; > > Then we will specialize layout_left/right/stride to use > _Rank0_mapping_base > > as a base for rank() == 0 > > and layout_left/right to use _Rank1_mapping as base for rank()1; > > template<typename T, unsigned... Ids> > > struct extents {}; > > > > struct layout > > { > > template<typename Extends> > > struct mapping > > { > > // static assert that Extents mmyst be specialization of _Extents goes > here. > > } > > }; > > > > template<typename _IndexType> > > struct layout::mapping<extents<_IndexType>> > > : _Rank0_mapping_base<extents<_IndexType>> > > { > > using layout_type = layout_left; > > // Provides converting constructor. > > using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base; > > // This one is implicit; > > mapping(_Rank0_mapping_base<extents<_IndexType>> const&); > > }; > > > > template<typename _IndexType, unsigned _Ext> > > struct layout::mapping<extents<_IndexType, _Ext>> > > : _Rank1_mapping_base<extents<_IndexType>> > > > > { > > using layout_type = layout_left; > > // Provides converting constructor. > > using _Rank0_mapping_base<extents<_IndexType>>::_Rank0_mapping_base; > > // This one is implicit, allows construction from layout_right > > mapping(_Rank1_mapping_base<extents<_IndexType>> const&); > > }; > > }; > > > > template<typename _IndexType, unsigned... _Ext> > > requires sizeof..(_Ext) > = 2 > > struct layout::mapping<extents<_IndexType, _Ext>> > > > > The last one is a generic implementation that you can use in yours. > > Please also include a comment explaining that we are deviating from > > standard text here. > > > > > > On Tue, Apr 29, 2025 at 2:56 PM Luc Grosheintz <luc.groshei...@gmail.com > > > > wrote: > > > >> Implements the parts of layout_left that don't depend on any of the > >> other layouts. > >> > >> libstdc++/ChangeLog: > >> > >> * include/std/mdspan (layout_left): New class. > >> > >> Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com> > >> --- > >> libstdc++-v3/include/std/mdspan | 179 ++++++++++++++++++++++++++++++++ > >> 1 file changed, 179 insertions(+) > >> > >> diff --git a/libstdc++-v3/include/std/mdspan > >> b/libstdc++-v3/include/std/mdspan > >> index 39ced1d6301..e05048a5b93 100644 > >> --- a/libstdc++-v3/include/std/mdspan > >> +++ b/libstdc++-v3/include/std/mdspan > >> @@ -286,6 +286,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> > >> namespace __mdspan > >> { > >> + template<typename _Extents> > >> + constexpr typename _Extents::index_type > >> + __fwd_prod(const _Extents& __exts, size_t __r) noexcept > >> + { > >> + typename _Extents::index_type __fwd = 1; > >> + for(size_t __i = 0; __i < __r; ++__i) > >> + __fwd *= __exts.extent(__i); > >> + return __fwd; > >> + } > >> > > As we are inside the standard library implementation, we can do some > tricks > > here, > > and provide two functions: > > // Returns the std::span(_ExtentsStorage::_Ext).substr(f, l); > > // For extents forward to __static_exts > > span<typename Extends::index_type> __static_exts(size_t f, size_t l); > > // Returns the > > > std::span(_ExtentsStorage::_M_dynamic_extents).substr(_S_dynamic_index[f], > > _S_dynamic_index[l); > > span<typename Extends::index_type> __dynamic_exts(Extents const& c); > > Then you can befriend this function both to extents and _ExtentsStorage. > > Also add index_type members to _ExtentsStorage. > > > > Then instead of having fwd-prod and rev-prod I would have: > > template<typename _Extents> > > consteval size_t __static_ext_prod(size_t f, size_t l) > > { > > // multiply E != dynamic_ext from __static_exts > > } > > constexpr size __ext_prod(const _Extents& __exts, size_t f, size_t l) > > { > > // multiply __static_ext_prod<_Extents>(f, l) and each elements of > > __dynamic_exts(__exts, f, l); > > } > > > > Then fwd-prod(e, n) would be __ext_prod(e, 0, n), and rev_prod(e, n) > would > > be __ext_prod(e, __ext.rank() -n, n, __ext.rank()) > > > > > >> + > >> + template<typename _Extents> > >> + constexpr typename _Extents::index_type > >> + __rev_prod(const _Extents& __exts, size_t __r) noexcept > >> + { > >> + typename _Extents::index_type __rev = 1; > >> + for(size_t __i = __r + 1; __i < __exts.rank(); ++__i) > >> + __rev *= __exts.extent(__i); > >> + return __rev; > >> + } > >> + > >> template<typename _IndexType, size_t... _Counts> > >> auto __build_dextents_type(integer_sequence<size_t, _Counts...>) > >> -> extents<_IndexType, ((void) _Counts, dynamic_extent)...>; > >> @@ -304,6 +324,165 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> explicit extents(_Integrals...) -> > >> extents<size_t, __mdspan::__dynamic_extent<_Integrals>()...>; > >> > >> + struct layout_left > >> + { > >> + template<typename _Extents> > >> + class mapping; > >> + }; > >> + > >> + namespace __mdspan > >> + { > >> + template<typename _Tp> > >> + constexpr bool __is_extents = false; > >> + > >> + template<typename _IndexType, size_t... _Extents> > >> + constexpr bool __is_extents<extents<_IndexType, _Extents...>> = > >> true; > >> + > >> + template<size_t _Count> > >> + struct _LinearIndexLeft > >> + { > >> + template<typename _Extents, typename... _Indices> > >> + static constexpr typename _Extents::index_type > >> + _S_value(const _Extents& __exts, typename _Extents::index_type > >> __idx, > >> + _Indices... __indices) noexcept > >> + { > >> + return __idx + __exts.extent(_Count) > >> + * _LinearIndexLeft<_Count + 1>::_S_value(__exts, > __indices...); > >> + } > >> + > >> + template<typename _Extents> > >> + static constexpr typename _Extents::index_type > >> + _S_value(const _Extents&) noexcept > >> + { return 0; } > >> + }; > >> + > >> + template<typename _Extents, typename... _Indices> > >> + constexpr typename _Extents::index_type > >> + __linear_index_left(const _Extents& __exts, _Indices... > __indices) > >> + { > >> + return _LinearIndexLeft<0>::_S_value(__exts, __indices...); > >> + } > >> > > This can be eliminated by fold expressions, see below. > > > >> + > >> + template<typename _IndexType, typename _Tp, size_t _Nm> > >> + consteval bool > >> + __is_representable_product(array<_Tp, _Nm> __factors) > >> + { > >> + size_t __rest = numeric_limits<_IndexType>::max(); > >> + for(size_t __i = 0; __i < _Nm; ++__i) > >> + { > >> + if (__factors[__i] == 0) > >> + return true; > >> + __rest /= _IndexType(__factors[__i]); > >> + } > >> + return __rest > 0; > >> + } > >> > > I would replace that with > > template<IndexType> > > consteval size_t __div_reminder(span<const size_t, _Nm> __factors, size_t > > __val) > > { > > size_t __rest = val; > > for(size_t __i = 0; __i < _Nm; ++__i) > > { > > if (__factors[__i] == dynamic_extent) > > continue; > > if (__factors[__i] != 0) > > return val; > > __rest /= _IndexType(__factors[__i]); > > if (__res == 0) > > return 0; > > } > > return __rest; > > } > > > > We can express the is presentable check as > > static constexpr __dyn_reminder = > __div_reminder(__static_exts<Extents>(0, > > rank()), std::numeric_limits<Index>::max()); > > static_assert(__dyn_reminder > 0); > > However, with __dyn_reminder value, the precondition > > https://eel.is/c++draft/mdspan.layout#left.cons-1, > > can be checked by doing equivalent of __div_remainder for __dyn_extents > > with __val being __dyn_reminder. > > > > > >> + > >> + template<typename _Extents> > >> + consteval array<typename _Extents::index_type, _Extents::rank()> > >> + __static_extents_array() > >> + { > >> + array<typename _Extents::index_type, _Extents::rank()> __exts; > >> + for(size_t __i = 0; __i < _Extents::rank(); ++__i) > >> + __exts[__i] = _Extents::static_extent(__i); > >> + return __exts; > >> + } > >> > > > > Replaced by __static_exts accessor, as described above. > > > > > >> + > >> + template<typename _Extents, typename _IndexType> > >> + concept __representable_size = _Extents::rank_dynamic() != 0 > >> + || __is_representable_product<_IndexType>( > >> + __static_extents_array<_Extents>()); > >> + > >> + template<typename _Extents> > >> + concept __layout_extent = __representable_size< > >> + _Extents, typename _Extents::index_type>; > >> + } > >> > > + > >> + template<typename _Extents> > >> + class layout_left::mapping > >> + { > >> + static_assert(__mdspan::__layout_extent<_Extents>, > >> + "The size of extents_type is not representable as index_type."); > >> + public: > >> + using extents_type = _Extents; > >> + using index_type = typename extents_type::index_type; > >> + using size_type = typename extents_type::size_type; > >> + using rank_type = typename extents_type::rank_type; > >> + using layout_type = layout_left; > >> + > >> + constexpr > >> + mapping() noexcept = default; > >> + > >> + constexpr > >> + mapping(const mapping&) noexcept = default; > >> + > >> + constexpr > >> + mapping(const extents_type& __extents) noexcept > >> + : _M_extents(__extents) > >> + { > >> > > > > > > > >> } > >> + > >> + template<typename _OExtents> > >> + requires (is_constructible_v<extents_type, _OExtents>) > >> + constexpr explicit(!is_convertible_v<_OExtents, extents_type>) > >> + mapping(const mapping<_OExtents>& __other) noexcept > >> + : _M_extents(__other.extents()) > >> + { > > > > Here we could do checks at compile time: > > if constexpr(_OExtents::rank_dynamic() == 0) > > static_assert( __div_remainder(...) > 0); > > } > > > > > >> } > >> + > >> + constexpr mapping& > >> + operator=(const mapping&) noexcept = default; > >> + > >> + constexpr const extents_type& > >> + extents() const noexcept { return _M_extents; } > >> + > >> + constexpr index_type > >> + required_span_size() const noexcept > >> + { return __mdspan::__fwd_prod(_M_extents, _M_extents.rank()); } > >> + > >> + template<__mdspan::__valid_index_type<index_type>... _Indices> > >> > > // Because we extracted rank0 and rank1 specializations > > > >> + requires (sizeof...(_Indices) + 1 == extents_type::rank()) > >> + constexpr index_type > >> + operator()(index_type __idx, _Indices... __indices) const > noexcept > >> + { > >> > > This could be implemented as, please synchronize the names. > > if constexpr (!is_same_v<_Indices, index_type> || ...) > > // Reduce the number of instantations. > > return operator()(index_type _idx0, > > static_cast<index_type>(std::move(__indices))....); > > else > > { > > // This can be used for layout stride, if you start with __res = > 0; > > index_type __res = _idx0; > > index_type __mult = _M_extents.extent(0); > > auto __update = [&__res, &__mult, __pos = 1u](index_type __idx) > > mutable > > { > > __res += __idx * __mult; > > __mult *= _M_extents.extent(__pos); > > ++__pos; > > }; > > // Fold over invocation of lambda > > (__update(_Indices), ....); > > return __res; > > } > > The reason I chose the recursive formulation is that I'd like it to > implement: `i + n*(j + m*k)` and because the following code > > size_t linear_index1(const std::extents<size_t, 3, 5, 7, 11>& exts, > size_t i, size_t j, size_t k, size_t l) > { > Layout::mapping<std::extents<size_t, 3, 5, 7, 11>> m(exts); > return m(i, j, k, l); > } > > compiles to (with -O2): > > 0: 4a 8d 04 c1 lea rax,[rcx+r8*8] > 4: 4c 29 c0 sub rax,r8 > 7: 48 8d 04 80 lea rax,[rax+rax*4] > b: 48 01 d0 add rax,rdx > e: 48 8d 04 40 lea rax,[rax+rax*2] > 12: 48 01 f0 add rax,rsi > 15: c3 ret > > Whereas the suggested implementation using a lambda and folding > implements `i + n*j + n*m*k` and compiles to: > > 0: 4d 6b c0 69 imul r8,r8,0x69 > 4: 48 89 c8 mov rax,rcx > 7: 48 c1 e0 04 shl rax,0x4 > b: 48 29 c8 sub rax,rcx > e: 49 01 c0 add r8,rax > 11: 48 8d 04 52 lea rax,[rdx+rdx*2] > 15: 49 01 f0 add r8,rsi > 18: 4c 01 c0 add rax,r8 > 1b: c3 ret > > It's not obvious to me which is preferred, the one doesn't use `imul` > while the other could maybe make more use of a super-scalar processor? > (For example the `imul` could overlap with the next three instructions.) > > Since this is (likely) a performance critical part of mdspan, it makes > sense to discuss which formula we'd like to implement and pick whatever > dialect of C++ that compiles into the preferred assembly. > > I checked both implementations on something similar to: > > double a = 0.0; > double b = 0.0; > double c = 0.0; > double d = 0.0; > > for(size_t i = 0; i < n / 4; i += 4) > { > a += values[m(i+0, j, k)]; > b += values[m(i+1, j, k)]; > c += values[m(i+2, j, k)]; > d += values[m(i+3, j, k)]; > } > double total = a + b + c + d; > > and with both implementations, as far as I can tell, vectorization works > as expected. In particular, the compiler is perfectly able to see that > the stride in the inner most loop is 1. > > I feel this is one of the places where, at a slightly later point, we'll > probably will want to measure. > > Would it make sense to postpone a detailed discussion of this until we've > settled the other parts of mdspan? > I think I would prefer a lambda based implementation: the side effect of the implementation based on the recursion is that it emits symbols for binary, that needs to be later handled at least by the linker. So. my preference would be to have a implementation the compile time, until we get measurements suggesting that other alternatives are better. We could implement the i + n*(j + m*k) formulation with lambda inside lambda, my only point is to avoid instantiating multiple templates to compute the stride. > > > > > This could be even simpler and written as (use for layout stride): > > size_t __pos = 0; > > return (index_type(0) + ... + __indices * stride(__pos++)); > > Here, I prefer to avoid multiplying multiple times. > > (This is already essentially how layout_stride implement it.) > > > > > > > + return __mdspan::__linear_index_left( > >> + _M_extents, static_cast<index_type>(__indices)...); > >> + } > >> + > >> + static constexpr bool > >> + is_always_unique() noexcept { return true; } > >> + > >> + static constexpr bool > >> + is_always_exhaustive() noexcept { return true; } > >> + > >> + static constexpr bool > >> + is_always_strided() noexcept { return true; } > >> + > >> + static constexpr bool > >> + is_unique() noexcept { return true; } > >> + > >> + static constexpr bool > >> + is_exhaustive() noexcept { return true; } > >> + > >> + static constexpr bool > >> + is_strided() noexcept { return true; } > >> + > >> + constexpr index_type > >> + stride(rank_type __i) const noexcept > >> + requires (extents_type::rank() > 0) > >> + { > >> + __glibcxx_assert(__i < extents_type::rank()); > >> + return __mdspan::__fwd_prod(_M_extents, __i); > >> + } > >> + > >> + template<typename _OExtents> > >> + requires (extents_type::rank() == _OExtents::rank()) > >> + friend constexpr bool > >> + operator==(const mapping& __self, const mapping<_OExtents>& > >> __other) > >> + noexcept > >> + { return __self.extents() == __other.extents(); } > >> + > >> + private: > >> + [[no_unique_address]] extents_type _M_extents; > >> + }; > >> + > >> _GLIBCXX_END_NAMESPACE_VERSION > >> } > >> #endif > >> -- > >> 2.49.0 > >> > >> > > > >