On Sun, 2025-03-23 at 10:17 +0800, Fang Zhang wrote: > Sebastian, > > I think the core issue you pointed out is indeed correct, but your > detailed > explanation is backwards, since `maximum(arr[:, 0], arr[:, 1])` > implements
Ah, right, I explained things thinking of axis=1, thanks! As you said the arguments should be identical, but reversed. > `arr.max(axis=1)` instead of `arr.max(axis=0)`. So OP's transpose > method is > essentially approach 1, which for this array shape has less overhead > than > approach 2 (and since 2 is small enough, it doesn't matter that we > are > paying this overhead in Python instead of in C). > > > From my tests, when n = 1_000_000, the advantage of the transpose > > method > persists up to m = 6. But for smaller n the normal method becomes > better > again, so it's not that clear cut (although it would be nice to have > a way > to use the transpose method but pay the overhead in C instead of > Python!) > > Test code: > > import numpy as np > from IPython import get_ipython > > def calculate_bbox_normal(vertices: np.ndarray): > return vertices.min(axis=0), vertices.max(axis=0) > > > def calculate_bbox_transpose(vertices: np.ndarray): > return np.array([col.min() for col in vertices.T]), > np.array([col.max() > for col in vertices.T]) > > n = 1_000_000 > for m in range(4, 9): > print(f"shape: {n, m}") > vertices = np.random.random((n, m)) > vmin, vmax = calculate_bbox_normal(vertices) > vmin_t, vmax_t = calculate_bbox_transpose(vertices) > assert (vmin == vmin_t).all() > assert (vmax == vmax_t).all() > print("Normal: ", end="") > get_ipython().run_line_magic("timeit", > "calculate_bbox_normal(vertices)") > print("Transpose: ", end="") > get_ipython().run_line_magic("timeit", > "calculate_bbox_transpose(vertices)") > print() > > - Fang > > On Sat, Mar 22, 2025 at 7:10 PM Sebastian Berg > <sebast...@sipsolutions.net> > wrote: > > > On Fri, 2025-03-21 at 23:22 +0100, Tiziano Zito via NumPy- > > Discussion > > wrote: > > > Hi George, > > > > > > what you see is due to the memory layout of numpy arrays. If you > > > switch your array to F-order you'll see that the two functions > > > have > > > the same timings, i.e. both are fast (on my machine 25 times > > > faster > > > for the 1_000_000 points case). > > > > > > Memory order is indeed very important, but it should be indirectly > > so! > > > > The thing is that NumPy tries to re-order most operations for > > memory > > access order (not particularly advanced). > > > > I.e. NumPy can calculate this type of operation in two ways (the > > inner- > > loop is always 1-D, so in C first, outer, loop here is separate): > > 1. for i in arr.shape[0]: res[i] = arr[i].max() > > 2. for i in arr.shape[1]: res[:] = maximum(res, arr[i]) > > (For i == 0, you would just copy `arr[0]` or initialize `res` > > first) > > > > NumPy chooses the first form here, precisely because it is > > (normally) > > better for the memory layout, here. > > But for 1 `arr[i].max()` is actually a function call inside NumPy > > (deep > > in C, not on an actual NumPy array object of course). > > > > So for `arr.shape[1] == 2`, that just gets in the way because of > > overheads! > > Writing it as 2. is much better (memory order doesn't even matter), > > calculating it as `maximum(arr[0], arr[1])` would be the best. > > > > But even for other small values for `arr.shape[1]` it would > > probably be > > better to use the second version, memory order will start to matter > > more and more (I could imagine even for arr.shape[1] == 3 or 4 it > > dominates quickly for huge arrays, but didn't try). > > > > So what you see should be overheads of using approach 1. Taking > > the > > maximum of 2 elements is simply fast compared to even a lightweight > > outer for loop and just calling the function to take the maximum. > > (And the core function does a bit of extra work to optimize for > > many > > elements additionally [1]) > > You really don't need large inner-loops to hide most of those > > overheads, but 2 elements just isn't enough. > > > > Anyway, of course it could be cool to add a heuristic to pick > > approach > > 2 here when we obviously have `arr.shape[1] == 2` or maybe > > `arr.shape < > > small_value`. > > There are also probably some low-hanging fruits to just reduce the > > overheads here (for this specific case I know the outer iteration > > can > > be optimized, although I am not sure it would improve things, also, > > some inner-loop optimizations could be moved out of the inner-loop > > since a few year now -- but only casts actually do so). > > > > In the end, `maximum(arr[:, 0], arr[:, 1])` will always the fastest > > way > > for this specific thing, because it explicitly picks the clearly > > best > > approach. > > It would be good to close that gap at least a bit, though. > > > > - Sebastian > > > > > > [1] locally I can actually see a small speed-up if I pick less > > optimized loops at run-time via `NPY_DISABLE_CPU_FEATURES=AVX2`. > > > > > > > > Try: > > > > > > vertices = np.array(np.random.random((n, 2)), order='F') > > > > > > When your array doesn't fit in L1-cache anymore, either order 'C' > > > or > > > order 'F' becomes (much) more efficient depending on which > > > dimension > > > you are internally looping through. > > > > > > You can read more about it here: > > > > > https://numpy.org/doc/stable/dev/internals.html#internal-organization-of-numpy-arrays > > > and > > > > > https://numpy.org/doc/stable/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray > > > > > > Hope that helps, > > > Tiziano > > > > > > > > > On Fri 21 Mar, 12:10 +0200, George Tsiamasiotis via NumPy- > > > Discussion > > > <numpy-discussion@python.org> wrote: > > > > Hello NumPy community! > > > > > > > > I was writing a function that calculates the bounding box of a > > > > polygon (the smallest rectangle that fully contains the > > > > polygon, > > > > and who's sides are parallel to the x and y axes). The input is > > > > a > > > > (N,2) array containing the vertices of the polygon, and the > > > > output > > > > is a 4-tuple containing the vertices of the 2 corners of the > > > > bounding box. > > > > > > > > I found two ways to do that using the np.min() and np.max() > > > > methods, and I was surprised to see a significant speed > > > > difference, > > > > even though they seemingly do the same thing. > > > > > > > > While for small N the speed is essentially the same, the > > > > difference > > > > becomes noticeable for larger N. >From my testing, the > > > > difference > > > > seems to plateau, with the one way being around 4-5 times > > > > faster > > > > than the other. > > > > > > > > Is there an explanation for this? > > > > > > > > Here is a small benchmark I wrote (must be executed with > > > > IPython): > > > > > > > > import numpy as np > > > > from IPython import get_ipython > > > > > > > > vertices = np.random.random((1000, 2)) > > > > > > > > def calculate_bbox_normal(vertices: np.ndarray) -> > > > > tuple[np.float64]: > > > > xmin, ymin = vertices.min(axis=0) > > > > xmax, ymax = vertices.max(axis=0) > > > > return xmin, ymin, xmax, ymax > > > > > > > > def calculate_bbox_transpose(vertices: np.ndarray) -> > > > > tuple[np.float64]: > > > > xmin = vertices.T[0].min() > > > > xmax = vertices.T[0].max() > > > > ymin = vertices.T[1].min() > > > > ymax = vertices.T[1].max() > > > > return xmin, ymin, xmax, ymax > > > > > > > > bbox_normal = calculate_bbox_normal(vertices) > > > > bbox_transpose = calculate_bbox_transpose(vertices) > > > > > > > > print(f"Equality: {bbox_normal == bbox_transpose}") > > > > > > > > for n in [10, 100, 1000, 10_000, 100_000, 1_000_000]: > > > > print(f"Number of points: {n}") > > > > vertices = np.random.random((n, 2)) > > > > print("Normal: ", end="") > > > > get_ipython().run_line_magic("timeit", > > > > "calculate_bbox_normal(vertices)") > > > > print("Transpose: ", end="") > > > > get_ipython().run_line_magic("timeit", > > > > "calculate_bbox_transpose(vertices)") > > > > print() > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: opossumn...@gmail.com > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sebast...@sipsolutions.net > > > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: fan...@umich.edu > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com