On 8/5/25 10:16, Tomasz Kaminski wrote:
Hi,

I have posted v3 patches with changes I have made locally for first 6
patches, and I think this series
is ready to land, in addition to
https://gcc.gnu.org/pipermail/libstdc++/2025-July/062727.html, that I
already reviewed.
I will keep aligned accessor on top of it.

As the separate commit, we should update the __fwd_partial_prod and
__rev_partial_prod computation
to use fact that this is partial prod, and we have previous partial
product computed:
     template<array _Extents>
       constexpr auto __fwd_partial_prods = [] consteval
         {
           constexpr size_t __rank = _Extents.size();
           std::array<size_t, __rank + 1> __ret;
           __ret[0] = 1;
           for(size_t __r = 1; __r < __rank + 1; ++__r)
             if (_Extents[__r] != dynamic_extents)
               __ret[__r] = __ret[0] * _Extents[__r];
           return __ret;
         }();
We are doing this at compile time, but this should help composition speed.
I would then inline __static_prod loop into the __mdspan::__size function
and remove that function
entirely.

I checked both approaches, and it didn't affect the compile time in
the slightest, so I decided it wasn't worth it; and went with the
less error prone solution. OTOH, you're right, knowingly leaving in
something that's accidentally quadratic is begging for some obscure
problem to arise. Moreover, measuring these can be deceptive, because
they have a tendency to be fast enough, right up to the point where it
becomes prohibitively expensive.

If you want I can fix this up in a separate patch.



I would be interested if we could reduce the __fwd/rev_partial_prod sizes,
but not keep the outermost dimensions there,
i.e. adding a check for __r == 0 early in the __fwd_prod funciton.
         constexpr size_t __rank = _Extents::rank();
         constexpr auto& __sta_exts = __static_extents<_Extents>();
         if constexpr (__rank == 1)
           return 1;
        // new if here, that is extracted from __rank == 2
         else if (__r == 0)
           return 1;
         else if constexpr (__rank == 2)
           return _exts.extent(0);
         else if constexpr
(__all_dynamic(std::span(__sta_exts).first(__rank-1)))
           return __extents_prod(__exts, 1, 0, __r);
         else
           {
             size_t __sta_prod = __fwd_partial_prods<__sta_exts>[__r-1]; //
size reduce by one here
             return __extents_prod(__exts, __sta_prod, 0, __r);
           }

To me it's not obvious how this one will go. Intuitively, this adds a
comparison and branching to avoid loading 8 bytes. Compared to the other
size reductions, this has a lower upper limit as to how effective it's
going to be.

I don't think I'll work on this, since I don't have the required test
cases to properly measure this to justify the change; and I don't trust
myself to make the right call without very exhaustive measurements. I
also don't have access to a suitably large variety of hardware.


Similarly for __rev_prod:
         constexpr size_t __rank = _Extents::rank();
         constexpr auto& __sta_exts = __static_extents<_Extents>();
         if constexpr (__rank == 1)
           return 1;
        else if (__r == __rank - 1)
           return 1;
         else if constexpr (__rank == 2)
           return __exts.extent(1);
         else if constexpr
(__all_dynamic(std::span(__sta_exts).last(__rank-1)))
           return __extents_prod(__exts, 1, __r + 1, __rank);
         else
           {
             size_t __sta_prod = __rev_partial_prods<__sta_exts>[__r-1]; //
size reduced by one here.
             return __extents_prod(__exts, __sta_prod, __r + 1, __rank);
           }

Regards,
Tomasz


On Mon, Aug 4, 2025 at 7:51 PM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:



On 8/4/25 17:42, Tomasz Kaminski wrote:
On Mon, Aug 4, 2025 at 1:14 PM Tomasz Kaminski <tkami...@redhat.com>
wrote:



On Mon, Aug 4, 2025 at 1:08 PM Luc Grosheintz <luc.groshei...@gmail.com

wrote:

Hi Tomasz,

Thank you for the review! Sorry about the missing parens, even after
"Ctrl+F"ing each of the emails I can't find the requires clause (the
two I found both need the parens).

I was mentioning this one, but it is possible that I am wrong on this
one,
I haven't yet checked it, as I was planning to do it during full review.
      template<array _Extents>
        requires (__all_dynamic<_Extents>())
        class _StaticExtents<_Extents>;
But regardless, I will make that change locally if it does work, so you
do
not
need to update patch series for it.

Parenthesis seem to be indeed required here, so my comment was a red
herring.

I'm pretty sure that this is the reason why I wrongly concluded
that the pattern is `requires(cond)`, i.e. the parens are always
needed.



Do you think it's possible to merge this series before the other two
outstanding patch series? There's some risk of collision; making
changes
to this patch series is much more time consuming than rebasing (and
retesting the other two patch series).

This sounds very reasonable. I will prepare a stack of patches in your
suggested
order locally, and push them once they are approved. But it still make


Thank you,

Thank you for your continued contributions.

Luc

On 8/4/25 12:43, Tomasz Kaminski wrote:
Hi,

I this time I made a quick pass through all changes, before commenting
on
first commits,
and they look solid to me, and I haven't noticed anything I would like
to
change (except parentheses
around requires, but I will handle that locally). I will try to do a
full
review during this week.

Regards,
Tomasz

On Sun, Aug 3, 2025 at 10:59 PM Luc Grosheintz <
luc.groshei...@gmail.com>
wrote:

The combined effect of this sequence of change is:

     * a reduction in the number of template instantiations, by
       - avoiding needless dependency of IndexType,
       - special formulas for low-rank extents,
       - special formulas for (nearly) fully dynamic extents.

     * improved code quality, by
       - precomputing partial products of the static extents,
       - special cases for low-rank extents,
       - rewriting the condition E[i] == dynamic_extent in a more
         optimizer friendly manner.
       - effectively loop-unrolling extents::operator==.

While simplistic micro-benchmarking shows the effectiveness of these
changes, likely the stronger argument is presented in each commit:
     a) each change removes needless complexity,
     b) before/after examples of generated code show the
effectiveness.

Luc Grosheintz (8):
     libstdc++: Reduce template instantiations in <mdspan>.
     libstdc++: Precompute products of static extents.
     libstdc++: Improve low-rank layout_{left,right}::stride.
     libstdc++: Improve fully dynamic extents in mdspan.
     libstdc++: Improve nearly fully dynamic extents in mdspan.
     libstdc++: Reduce indirection in extents::extent.
     libstdc++: Improve extents::operator==.
     libstdc++: Replace numeric_limit with __int_traits in mdspan.

    libstdc++-v3/include/std/mdspan               | 282
+++++++++++++-----
    .../mdspan/extents/class_mandates_neg.cc      |   3 +
    2 files changed, 208 insertions(+), 77 deletions(-)

--
2.50.0










Reply via email to