On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <n...@pobox.com> wrote:
> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor > <jtaylor.deb...@googlemail.com> wrote: > > On 05.06.2013 16:33, Nathaniel Smith wrote: > >> The slow down people are worried about is, suppose that 'xp' has > >> 1,000,000 entries, and the user wants to interpolate 1 point. If we > >> can assume the array is sorted, then we can find which bin the 1 point > >> falls into by using binary search (np.searchsorted), which will > >> require examining ~log2(1,000,000) = 20 entries in the array. Checking > >> that it's sorted, though, will require examining all 1,000,000 points > >> -- it turns an O(log n) operation into an O(n) operation. And if this > >> is being called as part of some larger algorithm, this effect can > >> cascade -- if it gets called m times from a loop, now that's O(mn) > >> instead of (m log n), etc. That's often the difference between > >> tractable and intractable. > >> > >> If you submit a PR that gives interp the option to check for > >> monotonicity, then we'll almost certainly merge it :-). If you also > >> make it the default then there might be some debate, though I'd argue > >> for it... > > > > I would not like it when the performance goes from log to linear by > default. > > It is documented that the coordinates must be increasing after all. > > I agree, I don't like it either. But the problem is there are two > contradictory things and I don't like either of them: > 1) performance going from log to linear by default (for a subset of > somewhat extreme cases) > 2) people silently getting the wrong results (and according to reports > in this thread, the warning in the docs does not actually prevent > this) > > The question is which one do we dislike more. My feeling is that in > the cases where (1) comes up, it will annoy people and get them to > check the docs and find the go-faster switch, while the first warning > of (2) is when your spaceship crashes, so we should worry more about > (2). > > > How about as a compromise the function checks one or two closest > > neighbors around the interpolation point and warns/errors if those are > > not monotonic. > > > > Its not fool proof but should at least catch the very wrong cases with > > an acceptable performance loss. > > There are tons of ways for data to accidentally end up sorted within > each local region, but unsorted overall, e.g., if you're combining > results from parallel simulation runs. Such data would systematically > pass this check, but still give nonsensical answers. > Some actual benchmarks might be useful to the discussion. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion