This is a good place for numba if performance and memory use are a concern.

Good NumPy
125 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Naive numba (memory efficient)
386 ms ± 1.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Parallel numba (memory efficient, cpu bounded)
46.8 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Here is a gist that produced the timings:
https://gist.github.com/bashtage/39c50b223580b9bc3f763a3b1be3e478

All were on a 12 core Ryzen.

Kevin

On Mon, Aug 18, 2025 at 12:35 PM Christian Lorentzen via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> If W is just a vector, i.e. my main use case, than what you write is what
> I wanted to imply and actually how I implemented it in several cases.
> The point is that X.T * weights[None, :] still doubles memory (same size
> as X) and this solution doesn't exploit the symmetry.
> Best
> Christian
>
> On 17.08.2025 22:13, Kevin Jacobs wrote:
>
> Why not:
>
> XTWX = (X.T * weights[None, :]) @ X
>
> if `weights` is a vector?  Allocating `diag(weights)` is the big memory
> and CPU hog at least for my generally long and skinny X matrices.
>
> -Kevin
>
>
> On Fri, Aug 15, 2025 at 10:08 AM Christian Lorentzen via NumPy-Discussion <
> numpy-discussion@python.org> wrote:
>
>> Dear numpy community
>>
>> I would like to propose a mew feature in np.linalg to compute weighted
>> gram matrix (sandwich product), see
>> https://github.com/numpy/numpy/issues/29559.
>> Proposed new feature or change:
>>
>> The solvers for many important statistical models like Generalized Linear
>> Models often need to compute X.T @ W @ X where X is a 2-dimensional
>> array of features (features in columns, observations in rows) and W is
>> very often a diagonal array of (current) weights. This computation is
>> usually the main computational bottleneck.
>>
>> It would be great if numpy could provide an efficient implementation of
>> it, e.g. np.linalg.sandwicht_product(X, weight=w).
>>
>> Why numpy? It seems even more unrealistic to me, to get it into BLAS
>> implementations. Numpy has support for SIMD via Highways.
>> Computational alternatives
>>
>> Drawbacks of (X.T * diag(W)) @ X:
>>
>>    - Additional memory allocation to compute X.T @ diag(W) , same size
>>    as (usually large) X.
>>    - The result is symmetric, but this fact is not used. So at least a
>>    factor of 2 is possible.
>>    Note that without weights, X.T @ X uses BLAS syrk. But the weights
>>    are crucial.
>>
>> Drawback of Z = np.sqrt(diag(W))[:, None] * X and then Z @ Z:
>>
>>    - Additional memory allocation for Z, same size as (usually large) X.
>>    - Taking square roots for possibly large diag(W)
>>
>> Additional information
>>
>> https://github.com/Quantco/tabmat has an implementation of it with XSIMD.
>>
>>
>> Best
>> Christian Lorentzen
>> _______________________________________________
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
>> Member address: jac...@bioinformed.com
>>
>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to 
> numpy-discussion-leave@python.orghttps://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: lorentzen...@gmail.com
>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: kevin.k.shepp...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: arch...@mail-archive.com

Reply via email to