Re: [Numpy-discussion] Indexing empty dimensions with empty arrays

Dag Sverre Seljebotn Wed, 28 Dec 2011 05:45:39 -0800

On 12/28/2011 02:21 PM, Ralf Gommers wrote:
>
>
> On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn
> <d.s.seljeb...@astro.uio.no <mailto:d.s.seljeb...@astro.uio.no>> wrote:
>
>     On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
>      > On 12/28/2011 09:33 AM, Ralf Gommers wrote:
>      >>
>      >>
>      >> 2011/12/27 Jordi Gutiérrez Hermoso<jord...@octave.org
>     <mailto:jord...@octave.org>
>      >> <mailto:jord...@octave.org <mailto:jord...@octave.org>>>
>      >>
>      >>      On 26 December 2011 14:56, Ralf
>     Gommers<ralf.gomm...@googlemail.com <mailto:ralf.gomm...@googlemail.com>
>      >> <mailto:ralf.gomm...@googlemail.com
>     <mailto:ralf.gomm...@googlemail.com>>>  wrote:
>      >> >
>      >> >
>      >> >  On Mon, Dec 26, 2011 at 8:50 PM,<josef.p...@gmail.com
>     <mailto:josef.p...@gmail.com>
>      >> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>>>  wrote:
>      >> >>  I have a hard time thinking through empty 2-dim arrays, and
>      >>      don't know
>      >> >>  what rules should apply.
>      >> >>  However, in my code I might want to catch these cases rather
>     early
>      >> >>  than late and then having to work my way backwards to find
>     out where
>      >> >>  the content disappeared.
>      >> >
>      >> >
>      >> >  Same here. Almost always, my empty arrays are either due to bugs
>      >>      or they
>      >> >  signal that I do need to special-case something. Silent passing
>      >>      through of
>      >> >  empty arrays to all numpy functions is not what I would want.
>      >>
>      >>      I find it quite annoying to treat the empty set with special
>      >>      deference. "All of my great-grandkids live in Antarctica"
>     should be
>      >>      true for me (I'm only 30 years old). If you decide that is
>     not true
>      >>      for me, it leads to a bunch of other logical annoyances up
>     there
>      >>
>      >>
>      >> Guess you don't mean true/false, because it's neither. But I
>     understand
>      >> you want an empty array back instead of an error.
>      >>
>      >> Currently the problem is that when you do get that empty array back,
>      >> you'll then use that for something else and it will probably still
>      >> crash. Many numpy functions do not check for empty input and
>     will still
>      >> give exceptions. My impression is that you're better off
>     handling these
>      >> where you create the empty array, rather than in some random
>     place later
>      >> on. The alternative is to have consistent rules for empty
>     arrays, and
>      >> handle them explicitly in all functions. Can be done, but is of
>     course a
>      >> lot of work and has some overhead.
>      >
>      > Are you saying that the existence of other bugs means that this bug
>      > shouldn't be fixed? I just fail to see the relevance of these
>     other bugs
>      > to this discussion.
>
>
> See below.
>
>      > For the record, I've encountered this bug many times myself and it's
>      > rather irritating, since it leads to more verbose code.
>      >
>      > It is useful whenever you want to return data that is a subset of the
>      > input data (since the selected subset can usually be zero-sized
>      > sometimes -- remember, in computer science the only numbers are 0, 1,
>      > and "any number").
>      >
>      > Here's one of the examples I've had. The Interpolative Decomposition
>      > decomposes a m-by-n matrix A of rank k as
>      >
>      > A = B C
>      >
>      > where B is an m-by-k matrix consisting of a subset of the columns
>     of A,
>      > and C is a k-by-n matrix.
>      >
>      > Now, if A is all zeros (which is often the case for me), then k
>     is 0. I
>      > would still like to create the m-by-0 matrix B by doing
>      >
>      > B = A[:, selected_columns]
>      >
>      > But now I have to do this instead:
>      >
>      > if len(selected_columns) == 0:
>      >       B = np.zeros((A.shape[0], 0), dtype=A.dtype)
>      > else:
>      >       B = A[:, selected_columns]
>      >
>      > In this case, zero-sized B and C are of course perfectly valid and
>      > useful results:
>      >
>      > In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
>      > Out[2]:
>      > array([[ 0.,  0.,  0.,  0.,  0.],
>      >          [ 0.,  0.,  0.,  0.,  0.],
>      >          [ 0.,  0.,  0.,  0.,  0.]])
>      >
>
>     And to answer the obvious question: Yes, this is a real usecase. It is
>     used for something similar to image compression, where sub-sections of
>     the images may well be all-zero and have zero rank (full story at [1]).
>
> Thanks for the example. I was a little surprised that dot works. Then I
> read what wikipedia had to say about empty arrays. It mentions dot like
> you do, and that the determinant of the 0-by-0 matrix is 1. So I try:
>
> In [1]: a = np.zeros((0,0))
>
> In [2]: a
> Out[2]: array([], shape=(0, 0), dtype=float64)
>
> In [3]: np.linalg.det(a)
> Parameter 4 to routine DGETRF was incorrect
> <segfault>


:-)

Well, a segfault is most certainly a bug, so this must be fixed one way 
or the other way anyway, and returning 1 seems at least as good a 
solution as raising an exception. Both solutions require an extra if-test.

>
>     Reading the above thread I understand Ralf's reasoning better, but
>     really, relying on NumPy's buggy behaviour to discover bugs in user code
>     seems like the wrong approach. Tools should be dumb unless there are
>     good reasons to make them smart. I'd be rather irritated about my hammer
>     if it refused to drive in nails that it decided where in the wrong spot.
>
>
> The point is not that we shouldn't fix it, but that it's a waste of time
> to fix it in only one place. I remember fixing several functions to
> explicitly check for empty arrays and then returning an empty array or
> giving a sensible error.
>
> So can you answer my question: do you think it's worth the time and
> computational overhead to handle empty arrays in all functions?

I'd hope the computational overhead is negligible?

I do believe that handling this correctly everywhere is the right thing 
to do and would improve overall code quality (as witnessed by the 
segfault found above).

Of course, likely nobody is ready to actually perform all that work. So 
the right thing to do seems to be to state that places where NumPy does 
not handle zero-size arrays is a bug, but not do anything about it until 
somebody actually submits a patch. That means, ending this email 
discussion by verifying that this is indeed a bug on Trac, and then wait 
and see if anybody bothers to submit a patch.

Dag Sverre
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Indexing empty dimensions with empty arrays

Reply via email to