On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris <charlesr.har...@gmail.com > wrote:
> > > On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <n...@pobox.com> wrote: > >> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor >> <jtaylor.deb...@googlemail.com> wrote: >> > On 05.06.2013 16:33, Nathaniel Smith wrote: >> >> The slow down people are worried about is, suppose that 'xp' has >> >> 1,000,000 entries, and the user wants to interpolate 1 point. If we >> >> can assume the array is sorted, then we can find which bin the 1 point >> >> falls into by using binary search (np.searchsorted), which will >> >> require examining ~log2(1,000,000) = 20 entries in the array. Checking >> >> that it's sorted, though, will require examining all 1,000,000 points >> >> -- it turns an O(log n) operation into an O(n) operation. And if this >> >> is being called as part of some larger algorithm, this effect can >> >> cascade -- if it gets called m times from a loop, now that's O(mn) >> >> instead of (m log n), etc. That's often the difference between >> >> tractable and intractable. >> >> >> >> If you submit a PR that gives interp the option to check for >> >> monotonicity, then we'll almost certainly merge it :-). If you also >> >> make it the default then there might be some debate, though I'd argue >> >> for it... >> > >> > I would not like it when the performance goes from log to linear by >> default. >> > It is documented that the coordinates must be increasing after all. >> >> I agree, I don't like it either. But the problem is there are two >> contradictory things and I don't like either of them: >> 1) performance going from log to linear by default (for a subset of >> somewhat extreme cases) >> 2) people silently getting the wrong results (and according to reports >> in this thread, the warning in the docs does not actually prevent >> this) >> >> The question is which one do we dislike more. My feeling is that in >> the cases where (1) comes up, it will annoy people and get them to >> check the docs and find the go-faster switch, while the first warning >> of (2) is when your spaceship crashes, so we should worry more about >> (2). >> >> > How about as a compromise the function checks one or two closest >> > neighbors around the interpolation point and warns/errors if those are >> > not monotonic. >> > >> > Its not fool proof but should at least catch the very wrong cases with >> > an acceptable performance loss. >> >> There are tons of ways for data to accidentally end up sorted within >> each local region, but unsorted overall, e.g., if you're combining >> results from parallel simulation runs. Such data would systematically >> pass this check, but still give nonsensical answers. >> > > Some actual benchmarks might be useful to the discussion. > And perhaps an inplace C function ismonotone would be generally useful. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion