On Fri, 2025-03-21 at 23:22 +0100, Tiziano Zito via NumPy-Discussion wrote: > Hi George, > > what you see is due to the memory layout of numpy arrays. If you > switch your array to F-order you'll see that the two functions have > the same timings, i.e. both are fast (on my machine 25 times faster > for the 1_000_000 points case).
Memory order is indeed very important, but it should be indirectly so! The thing is that NumPy tries to re-order most operations for memory access order (not particularly advanced). I.e. NumPy can calculate this type of operation in two ways (the inner- loop is always 1-D, so in C first, outer, loop here is separate): 1. for i in arr.shape[0]: res[i] = arr[i].max() 2. for i in arr.shape[1]: res[:] = maximum(res, arr[i]) (For i == 0, you would just copy `arr[0]` or initialize `res` first) NumPy chooses the first form here, precisely because it is (normally) better for the memory layout, here. But for 1 `arr[i].max()` is actually a function call inside NumPy (deep in C, not on an actual NumPy array object of course). So for `arr.shape[1] == 2`, that just gets in the way because of overheads! Writing it as 2. is much better (memory order doesn't even matter), calculating it as `maximum(arr[0], arr[1])` would be the best. But even for other small values for `arr.shape[1]` it would probably be better to use the second version, memory order will start to matter more and more (I could imagine even for arr.shape[1] == 3 or 4 it dominates quickly for huge arrays, but didn't try). So what you see should be overheads of using approach 1. Taking the maximum of 2 elements is simply fast compared to even a lightweight outer for loop and just calling the function to take the maximum. (And the core function does a bit of extra work to optimize for many elements additionally [1]) You really don't need large inner-loops to hide most of those overheads, but 2 elements just isn't enough. Anyway, of course it could be cool to add a heuristic to pick approach 2 here when we obviously have `arr.shape[1] == 2` or maybe `arr.shape < small_value`. There are also probably some low-hanging fruits to just reduce the overheads here (for this specific case I know the outer iteration can be optimized, although I am not sure it would improve things, also, some inner-loop optimizations could be moved out of the inner-loop since a few year now -- but only casts actually do so). In the end, `maximum(arr[:, 0], arr[:, 1])` will always the fastest way for this specific thing, because it explicitly picks the clearly best approach. It would be good to close that gap at least a bit, though. - Sebastian [1] locally I can actually see a small speed-up if I pick less optimized loops at run-time via `NPY_DISABLE_CPU_FEATURES=AVX2`. > > Try: > > vertices = np.array(np.random.random((n, 2)), order='F') > > When your array doesn't fit in L1-cache anymore, either order 'C' or > order 'F' becomes (much) more efficient depending on which dimension > you are internally looping through. > > You can read more about it here: > https://numpy.org/doc/stable/dev/internals.html#internal-organization-of-numpy-arrays > and > https://numpy.org/doc/stable/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray > > Hope that helps, > Tiziano > > > On Fri 21 Mar, 12:10 +0200, George Tsiamasiotis via NumPy-Discussion > <numpy-discussion@python.org> wrote: > > Hello NumPy community! > > > > I was writing a function that calculates the bounding box of a > > polygon (the smallest rectangle that fully contains the polygon, > > and who's sides are parallel to the x and y axes). The input is a > > (N,2) array containing the vertices of the polygon, and the output > > is a 4-tuple containing the vertices of the 2 corners of the > > bounding box. > > > > I found two ways to do that using the np.min() and np.max() > > methods, and I was surprised to see a significant speed difference, > > even though they seemingly do the same thing. > > > > While for small N the speed is essentially the same, the difference > > becomes noticeable for larger N. >From my testing, the difference > > seems to plateau, with the one way being around 4-5 times faster > > than the other. > > > > Is there an explanation for this? > > > > Here is a small benchmark I wrote (must be executed with IPython): > > > > import numpy as np > > from IPython import get_ipython > > > > vertices = np.random.random((1000, 2)) > > > > def calculate_bbox_normal(vertices: np.ndarray) -> > > tuple[np.float64]: > > xmin, ymin = vertices.min(axis=0) > > xmax, ymax = vertices.max(axis=0) > > return xmin, ymin, xmax, ymax > > > > def calculate_bbox_transpose(vertices: np.ndarray) -> > > tuple[np.float64]: > > xmin = vertices.T[0].min() > > xmax = vertices.T[0].max() > > ymin = vertices.T[1].min() > > ymax = vertices.T[1].max() > > return xmin, ymin, xmax, ymax > > > > bbox_normal = calculate_bbox_normal(vertices) > > bbox_transpose = calculate_bbox_transpose(vertices) > > > > print(f"Equality: {bbox_normal == bbox_transpose}") > > > > for n in [10, 100, 1000, 10_000, 100_000, 1_000_000]: > > print(f"Number of points: {n}") > > vertices = np.random.random((n, 2)) > > print("Normal: ", end="") > > get_ipython().run_line_magic("timeit", > > "calculate_bbox_normal(vertices)") > > print("Transpose: ", end="") > > get_ipython().run_line_magic("timeit", > > "calculate_bbox_transpose(vertices)") > > print() > > > > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: opossumn...@gmail.com > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com