For comparison, the NumPy vander function
https://github.com/numpy/numpy/blob/f4be1039d6fe3e4fdc157a22e8c071ac10651997/numpy/lib/twodim_base.py#L490-L577
does all its work in multiply.accumulate. Here is the outer loop of
multiply.accumulate (written in C):
https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/ufunc_object.c#L2936-L3264
and the inner loops (I think) are generated from this source file for
various numeric types:
https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/loops.c.src
A quick glance at these will tell you the price in code complexity that
NumPy is paying for the performance they manage to get.