On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gomm...@gmail.com> wrote: > > It is partly a plea for some development of numerically accurate >> functions for computing lat/lon grids from a combination of inputs: bounds, >> counts, and resolutions. >> > Can you be more specific about what problems you've run into -- I work with lat-lon grids all the time, and have never had a problem.
float32 degrees gives you about 1 meter accuracy or better, so I can see how losing a few digits might be an issue, though I would argue that you maybe shouldn't use float32 if you are worried about anything close to 1m accuracy... -- or shift to a relative coordinate system of some sort. I have been playing around with the decimal package a bit lately, >> > sigh. decimal is so often looked at a solution to a problem it isn't designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype :-) what you get from decimal is variable precision -- maybe a binary variable precision lib is a better answer -- that would be a good thing to have easy access to in numpy, but in this case, if you want better accuracy in a computation that will end up in float32, just use float64. and I discovered the concept of "fused multiply-add" operations for >> improved accuracy. I have come to realize that fma operations could be used >> to greatly improve the accuracy of linspace() and arange(). >> > arange() is problematic for non-integer use anyway, by its very definition (getting the "end point" correct requires the right step, even without FP error). and would it really help with linspace? it's computing a delta with one division in fp, then multiplying it by an integer (represented in fp -- why? why not keep that an integer till the multiply?). In particular, I have been needing improved results for computing >> latitude/longitude grids, which tend to be done in float32's to save memory >> (at least, this is true in data I come across). >> > > If you care about saving memory *and* accuracy, wouldn't it make more > sense to do your computations in float64, and convert to float32 at the > end? > that does seem to be the easy option :-) > Now, to the crux of my problem. It is next to impossible to generate a >> non-trivial numpy array of coordinates, even in double precision, without >> hitting significant numerical errors. >> > I'm confused, the example you posted doesn't have significant errors... > Which has lead me down the path of using the decimal package (which >> doesn't play very nicely with numpy because of the lack of casting rules >> for it). Consider the following: >> ``` >> $ cat test_fma.py >> from __future__ import print_function >> import numpy as np >> res = np.float32(0.01) >> cnt = 7001 >> x0 = np.float32(-115.0) >> x1 = res * cnt + x0 >> print("res * cnt + x0 = %.16f" % x1) >> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32') >> print("len(arange()): %d arange()[-1]: %16f" % (len(x), x[-1])) >> x = np.linspace(-115.0, -44.99, cnt, dtype='float32') >> print("linspace()[-1]: %.16f" % x[-1]) >> >> $ python test_fma.py >> res * cnt + x0 = -44.9900015648454428 >> len(arange()): 7002 arange()[-1]: -44.975044 >> linspace()[-1]: -44.9900016784667969 >> ``` >> arange just produces silly results (puts out an extra element... adding >> half of the resolution is typically mentioned as a solution on mailing >> lists to get around arange()'s limitations -- I personally don't do this). >> > The real solution is "don't do that" arange is not the right tool for the job. Then there is this: res * cnt + x0 = -44.9900015648454428 linspace()[-1]: -44.9900016784667969 that's as good as you are ever going to get with 32 bit floats... Though I just noticed something about your numbers -- there should be a nice even base ten delta if you have 7001 gaps -- but linspace produces N points, not N gaps -- so maybe you want: In [*17*]: l = np.linspace(-115.0, -44.99, 7002) In [*18*]: l[:5] Out[*18*]: array([-115. , -114.99, -114.98, -114.97, -114.96]) In [*19*]: l[-5:] Out[*19*]: array([-45.03, -45.02, -45.01, -45. , -44.99]) or, in float32 -- not as pretty: In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32) In [*21*]: l[:5] Out[*21*]: array([-115. , -114.98999786, -114.98000336, -114.97000122, -114.95999908], dtype=float32) In [*22*]: l[-5:] Out[*22*]: array([-45.02999878, -45.02000046, -45.00999832, -45. , -44.99000168], dtype=float32) but still as good as you get with float32, and exactly the same result as computing in float64 and converting: In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32) In [*26*]: l[:5] Out[*26*]: array([-115. , -114.98999786, -114.98000336, -114.97000122, -114.95999908], dtype=float32) In [*27*]: l[-5:] Out[*27*]: array([-45.02999878, -45.02000046, -45.00999832, -45. , -44.99000168], dtype=float32) >> So, does it make any sense to improve arange by utilizing fma() under the >> hood? >> > no -- this is simply not the right use-case for arange() anyway. > Also, any plans for making fma() available as a ufunc? >> > could be nice -- especially if used internally. > Notice that most of my examples required knowing the number of grid points >> ahead of time. But what if I didn't know that? What if I just have the >> bounds and the resolution? Then arange() is the natural fit, but as I >> showed, its accuracy is lacking, and you have to do some sort of hack to do >> a closed interval. >> > no -- it's not -- if you have the bounds and the resolution, you have an over-specified problem. That is: x_min + (n * delta_x) == x_max If there is ANY error in either delta_x or x_max (or x_min), then you'll get a missmatch. which is why arange is not the answer (you can make the algorithm a bit more accurate, I suppose but there is still fp limited precision -- if you can't exactly represent either delta_x or x_max, then you CAN'T use the arange() definition and expect to work consistently. The "right" way to do it is to compute N with: round((x_max - x_min) / delta), and then use linspace: linspace(x_min, x_max, N+1) (note that it's too bad you need to do N+1 -- if I had to do it over again, I'd use N as the number of "gaps" rather than the number of points -- that's more commonly what people want, if they care at all) This way, you get a grid with the endpoints as exact as they can be, and the deltas as close to each-other as they can be as well. maybe you can do a better algorithm in linspace to save an ULP, but it's hard to imagine when that would matter. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion