If you're going to provide routines for structure determination it
might be worth looking at algorithms that can identify more general or
less obvious structure as well. SymPy's matrices module needs a lot of
work and is improving a lot which will become noticeable over the next
few releases but one of the important optimisations being used is
Tarjan's algorithm for finding the strongly connected components of a
graph. This is a generalisation of checking for triangular or diagonal
matrices. With this approach you can identify any permutation of the
rows and columns of a square matrix that can bring it into block
triangular or block diagonal form which can reduce many O(n**3)
algorithms substantially. The big-O for Tarjan's algorithm itself is
basically the same as checking whether a matrix is
triangular/diagonal.

For example the matrix determinant is invariant under permutations of
the rows and columns. If you can permute a matrix into block
triangular form then the determinant is just the product of the
determinants of the diagonal blocks. If the base case algorithm has
n**3 operations then reducing it to two operations of size n/2 is a
speed up of ~4x. In the extreme this discovers that a matrix is
triangular and reduces the whole operation to O(n) (plus the cost of
Tarjan's algorithm). However the graph-based approach also benefits
wider classes e.g. you get almost all the same benefit for a matrix
that is almost diagonal but has a few off-diagonal elements.

Using sympy master branch (as .strongly_connected_components() is not
released yet):

In [19]: from sympy import Matrix

In [20]: M = Matrix([[1, 0, 2, 0], [9, 3, 1, 2], [3, 0, 4, 0], [5, 8, 6, 7]])

In [21]: M
Out[21]:
⎡1  0  2  0⎤
⎢9  3  1  2⎥
⎢3  0  4  0⎥
⎣5  8  6  7⎦

In [22]: M.strongly_connected_components()  # Tarjan's algorithm
Out[22]: [[0, 2], [1, 3]]

In [23]: M[[0, 2, 1, 3], [0, 2, 1, 3]] # outer indexing for permutation
Out[23]:
⎡1  2  0  0⎤
⎢3  4  0  0⎥
⎢9  1  3  2⎥
⎣5  6  8  7⎦

In [24]: M.det()
Out[24]: -10

In [25]: M[[0,2],[0,2]].det() * M[[1, 3], [1, 3]].det()
Out[25]: -10

--
Oscar

On Fri, 2 Jul 2021 at 12:20, Ilhan Polat <ilhanpo...@gmail.com> wrote:
>
> Ah right. So two things, the original reason f9r this question is because I 
> can't decide in https://github.com/scipy/scipy/pull/12824 whether others 
> would also benefit from quick structure determination.
>
> I can keep it private function or we can put them some misc or lib folder so 
> all can use. Say there is a special method for triangular matrices but you 
> can't guarantee the structure so you can quickly check for it. At worst 
> O(n**2) complexity for diagonal arrays and almost O(2n) for full arrays makes 
> it quite appealing.
>
> But then again maybe NumPy is a better place since probably it will be faster 
> to have this in pure C with the right headers and without the extra Cython 
> overhead.
>
> Funny you mention the container idea. This is precisely what I'm doing in PR 
> mentioned above (I'll push when I'm done). I stole the idea from Tim Davis 
> himself in a Julia discussion for keeping the factorization as an attribute 
> to be used later if need be. So yes it makes a lot of sense Sparse or not.
>
> On Wed, 30 Jun 2021, 19:14 Evgeni Burovski, <evgeny.burovs...@gmail.com> 
> wrote:
>>
>> Hi Ilhan,
>>
>> Overall I think something like this would be great. However, I wonder
>> if you considered having a specialized container with a structure tag
>> instead of trying to discover the structure. If it's a container, it
>> can neatly wrap various lapack storage schemes and dispatch to an
>> appropriate lapack functionality. Possibly even sparse storage
>> schemes. And it seems a bit more robust than trying to discover the
>> structure (e.g. what about off-band elements of  \sim 1e-16 etc).
>>
>> The next question is of course if this should live in scipy/numpy
>> .linalg or as a separate repo, at least for some time (maybe in the
>> scipy organization?). So that it can iterate faster, among other
>> things.
>> (I'd be interested in contributing FWIW)
>>
>> Cheers,
>>
>> Evgeni
>>
>>
>> On Wed, Jun 30, 2021 at 1:22 AM Ilhan Polat <ilhanpo...@gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > I'm writing some helper Cythpm functions for scipy.linalg which is kinda 
>> > performant and usable. And there is still quite some wiggle room for more.
>> >
>> > In many linalg routines there is a lot of performance benefit if the 
>> > structure can be discovered in a cheap and reliable way at the outset. For 
>> > example if symmetric then eig can delegate to eigh or if triangular then 
>> > triangular solvers can be used in linalg.solve and lstsq so forth
>> >
>> > Here is the Cythonized version for Jupyter notebook to paste to discover 
>> > the lower/upper bandwidth of square array A that competes well with A != 0 
>> > just to use some low level function (note the latter returns an array 
>> > hence more cost is involved) There is a higher level supervisor function 
>> > that checks C-contiguousness otherwise specializes to different versions 
>> > of it
>> >
>> > Initial cell
>> >
>> > %load_ext Cython
>> > %load_ext line_profiler
>> > import cython
>> > import line_profiler
>> >
>> > Then another cell
>> >
>> > %%cython
>> > # cython: language_level=3
>> > # cython: linetrace=True
>> > # cython: binding = True
>> > # distutils: define_macros=CYTHON_TRACE=1
>> > # distutils: define_macros=CYTHON_TRACE_NOGIL=1
>> >
>> > cimport cython
>> > cimport numpy as cnp
>> > import numpy as np
>> > import line_profiler
>> > ctypedef fused np_numeric_t:
>> >     cnp.int8_t
>> >     cnp.int16_t
>> >     cnp.int32_t
>> >     cnp.int64_t
>> >     cnp.uint8_t
>> >     cnp.uint16_t
>> >     cnp.uint32_t
>> >     cnp.uint64_t
>> >     cnp.float32_t
>> >     cnp.float64_t
>> >     cnp.complex64_t
>> >     cnp.complex128_t
>> >     cnp.int_t
>> >     cnp.long_t
>> >     cnp.longlong_t
>> >     cnp.uint_t
>> >     cnp.ulong_t
>> >     cnp.ulonglong_t
>> >     cnp.intp_t
>> >     cnp.uintp_t
>> >     cnp.float_t
>> >     cnp.double_t
>> >     cnp.longdouble_t
>> >
>> >
>> > @cython.linetrace(True)
>> > @cython.initializedcheck(False)
>> > @cython.boundscheck(False)
>> > @cython.wraparound(False)
>> > cpdef inline (int, int) band_check_internal(np_numeric_t[:, ::1]A):
>> >     cdef Py_ssize_t n = A.shape[0], lower_band = 0, upper_band = 0, r, c
>> >     cdef np_numeric_t zero = 0
>> >
>> >     for r in xrange(n):
>> >         # Only bother if outside the existing band:
>> >         for c in xrange(r-lower_band):
>> >             if A[r, c] != zero:
>> >                 lower_band = r - c
>> >                 break
>> >
>> >         for c in xrange(n - 1, r + upper_band, -1):
>> >             if A[r, c] != zero:
>> >                 upper_band = c - r
>> >                 break
>> >
>> >     return lower_band, upper_band
>> >
>> > Final cell for use-case ---------------
>> >
>> > # Make arbitrary lower-banded array
>> > n = 50 # array size
>> > k = 3 # k'th subdiagonal
>> > R = np.zeros([n, n], dtype=np.float32)
>> > R[[x for x in range(n)], [x for x in range(n)]] = 1
>> > R[[x for x in range(n-1)], [x for x in range(1,n)]] = 1
>> > R[[x for x in range(1,n)], [x for x in range(n-1)]] = 1
>> > R[[x for x in range(k,n)], [x for x in range(n-k)]] = 2
>> >
>> > Some very haphazardly put together metrics
>> >
>> > %timeit band_check_internal(R)
>> > 2.59 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>> >
>> > %timeit np.linalg.solve(R, zzz)
>> > 824 µs ± 6.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>> >
>> > %timeit R != 0.
>> > 1.65 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>> >
>> > So the worst case cost is negligible in general (note that the given code 
>> > is slower as it uses the fused type however if I go with tempita 
>> > standalone version is faster)
>> >
>> > Two questions:
>> >
>> > 1) This is missing np.half/float16 functionality since any arithmetic with 
>> > float16 is might not be reliable including nonzero check. IS it safe to 
>> > view it as np.uint16 and use that specialization? I'm not sure about the 
>> > sign bit hence the question. I can leave this out since almost all linalg 
>> > suite rejects this datatype due to well-known lack of supprt.
>> >
>> > 2) Should this be in NumPy or SciPy linalg? It is quite relevant to be on 
>> > SciPy but then again this stuff is purely about array structures. But if 
>> > the opinion is for NumPy then I would need a volunteer because NumPy 
>> > codebase flies way above my head.
>> >
>> >
>> > All feedback welcome
>> >
>> > Best
>> > ilhan
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to