On Tue, Oct 1, 2013 at 4:13 PM, Nathaniel Smith <[email protected]> wrote: > On 1 Oct 2013 17:34, "Charles R Harris" <[email protected]> wrote: >> >> >> >> >> On Tue, Oct 1, 2013 at 10:19 AM, <[email protected]> wrote: >>> >>> On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <[email protected]> wrote: >>> > On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris >>> > <[email protected]> wrote: >>> >> >>> >> >>> >> >>> >> On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <[email protected]> wrote: >>> >>> >>> >>> [switching subject to break out from the giant 1.8.0rc1 thread] >>> >>> >>> >>> On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris >>> >>> <[email protected]> wrote: >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <[email protected]> >>> >>> > wrote: >>> >>> >> >>> >>> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris >>> >>> >> <[email protected]> wrote: >>> >>> >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <[email protected]> >>> >>> >> > wrote: >>> >>> >> >> >>> >>> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke >>> >>> >> >> <[email protected]> >>> >>> >> >> wrote: >>> >>> >> >> > 2) Bottleneck 0.7.0 >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 >>> >>> >> >> >>> >>> >> >> I can't tell if these are real bugs in numpy, or tests checking >>> >>> >> >> that >>> >>> >> >> bottleneck is bug-for-bug compatible with old numpy and we just >>> >>> >> >> fixed >>> >>> >> >> some bugs, or what. It's clearly something to do with the >>> >>> >> >> nanarg{max,min} rewrite -- @charris, do you know what's going >>> >>> >> >> on >>> >>> >> >> here? >>> >>> >> >> >>> >>> >> > >>> >>> >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to >>> >>> >> > cast >>> >>> >> > nan >>> >>> >> > to >>> >>> >> > intp when the result was an array, and return nan when a scalar. >>> >>> >> > The >>> >>> >> > current >>> >>> >> > behaviour is to return the most negative value of intp as an >>> >>> >> > error >>> >>> >> > marker in >>> >>> >> > both cases and raise a warning. It is a change in behavior, but >>> >>> >> > I >>> >>> >> > think >>> >>> >> > one >>> >>> >> > that needs to be made. >>> >>> >> >>> >>> >> Ah, okay! I kind of lost track of the nanfunc changes by the end >>> >>> >> there. >>> >>> >> >>> >>> >> So for the bottleneck issue, it sounds like the problem is just >>> >>> >> that >>> >>> >> bottleneck is still emulating the old numpy behaviour in this >>> >>> >> corner >>> >>> >> case, which isn't really a problem. So we don't really need to >>> >>> >> worry >>> >>> >> about that, both behaviours are correct, just maybe out of sync. >>> >>> >> >>> >>> >> I'm a little dubious about this "make up some weird value that >>> >>> >> will >>> >>> >> *probably* blow up if people try to use it without checking, and >>> >>> >> also >>> >>> >> raise a warning" thing, wouldn't it make more sense to just raise >>> >>> >> an >>> >>> >> error? That's what exceptions are for? I guess I should have said >>> >>> >> something earlier though... >>> >>> >> >>> >>> > >>> >>> > I figure the blowup is safe, as we can't allocate arrays big enough >>> >>> > that >>> >>> > the >>> >>> > minimum intp value would be a valid index. I considered raising an >>> >>> > error, >>> >>> > and if there is a consensus the behavior could be changed. Or we >>> >>> > could >>> >>> > add a >>> >>> > keyword to determine the behavior. >>> >>> >>> >>> Yeah, the intp value can't be a valid index, so that covers 95% of >>> >>> cases, but I'm worried about that other 5%. It could still pass >>> >>> silently as the endpoint of a slice, or participate in some sort of >>> >>> integer arithmetic calculation, etc. I assume you also share this >>> >>> worry to some extent or you wouldn't have put in the warning ;-). >>> >>> >>> >>> I guess the bigger question is, why would we *not* use the standard >>> >>> method for signaling an exceptional condition here, i.e., exceptions? >>> >>> That way we're 100% guaranteed that if people aren't prepared to >>> >>> handle it then they'll at least know something has gone wrong, and if >>> >>> they are prepared to handle it then it's very easy and standard, just >>> >>> use try/except. Right now I guess you have to check for the special >>> >>> value, but also do something to silence warnings, but just for that >>> >>> one line? Sounds kind of complicated... >>> >> >>> >> >>> >> The main reason was for the case of multiple axis, where some of the >>> >> results >>> >> would be valid and others not. The simple thing might be to raise an >>> >> exception but keep the current return values so that users could >>> >> determine >>> >> where the problem occurred. >>> > >>> > Oh, duh, yes, right, now I remember this discussion. Sorry for being >>> > slow. >>> > >>> > In the past we've *always* raised in error in the multiple axis case, >>> > right? Has anyone ever complained? Wanting to get all >>> > nanargmax/nanargmin results, of which some might be errors, without >>> > just writing a loop, seems like a pretty exotic case to me, so I'm not >>> > sure we should optimize for it at the expense of returning >>> > possibly-misleading results in the scalar case. >>> > >>> > Like (I think) you say, we could get the best of both worlds by >>> > encoding the results in the same way we do right now, but then raise >>> > an exception and attach the results to the exception so they can be >>> > retrieved if wanted. Kind of cumbersome, but maybe good? >>> > >>> > This is a more general problem though of course -- we've run into it >>> > in the gufunc linalg code too, where there's some question about you >>> > do in e.g. chol() if some sub-matrices are positive-definite and some >>> > are not. >>> > >>> > Off the top of my head the general solution might be to define a >>> > MultiError exception type that has a standard generic format for >>> > describing such things. It'd need a mask saying which values were >>> > valid, rather than encoding them directly into the return values -- >>> > otherwise we have the problem where nanargmax wants to use INT_MIN, >>> > chol wants to use NaN, and maybe the next function along doesn't have >>> > any usable flag value available at all. So probably more thought is >>> > needed before nailing down exactly how we handle such "partial" errors >>> > for vectorized functions. >>> > >>> > In the short term (i.e., 1.8.0), maybe we should defer this discussion >>> > by simply raising a regular ValueError for nanarg functions on all >>> > errors? That's not a regression from 1.7, since 1.7 also didn't >>> > provide any way to get at partial results in the event of an error, >>> > and it leaves us in a good position to solve the more general problem >>> > later. >>> >>> Can we make the error optional in these cases? >>> >>> like np.seterr for zerodivision, invalid, or floating point errors >>> that allows ignore and raise >>> np.seterr(linalg='ignore') >>> >>> I don't know about nanarg, but thinking about some applications for >>> gufunc linalg code. >>> >>> In some cases I might require for example invertibility of all >>> matrices and raise if one fails, >>> in other case I would be happy with nans, and just sum the results >>> with nansum for example or replace them by some fill value. >>> >> I'm thinking warnings might be more flexible than exceptions: >> >> with warnings.catch_warnings(): >> warnings.simplefilter('error') >> ... > > Sure. Passing in a callback or just leaving the function out and telling > people to implement it themselves would be even more flexible :-). But we > have to trade off complexity of usage, complexity of teaching people how to > do stuff (nobody knows how to use catch_warnings, we only know because we > started writing warning tests just in the last year or so), usefulness in > common situations, etc. The warnings api doesn't give you any way to pass > results out, you still need a separate channel to say what failed and what > succeeded (and maybe for the failures to say what the different failures > are).
Since numpy and scipy just moved to python 2.6, it's time to advertise and support warnings.catch_warnings(). If you want to wait for a "missing value support" in numpy to support this, then this postpones this to .... (numpy 3.0?) while gufuncs seem to be happening now. Josef "from the balcony" 3-dimensional panel data linear algebra without vec and kron ? > > Anyway this back and forth still supprts my main suggestion for *right* now, > which is that this is sufficiently nonobvious that with 1.8 breathing down > our necks we should start with the safe behaviour and then work up from > there. > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
