On Thu, Aug 25, 2022 at 4:27 AM Bill Ross <bross_phobr...@sonic.net> wrote:
> Thanks, np.lib.format.open_memmap() works great! With prediction procs > using minimal sys memory, I can get twice as many on GPU, with fewer > optimization warnings. > > Why even have the number of records in the header? Shouldn't record size > plus system-reported/growable file size be enough? > Only in the happy case where there is no corruption. Implicitness is not a virtue in the use cases that the format was designed for. There is an additional use case where the length is unknown a priori where implicitness would help, but the format was not designed for that case (and I'm not sure I want to add that use case). > I'd love to have a shared-mem analog for smaller-scale data; now I load > data and fork to emulate that effect. > There are a number of ways to do that, including using memmap on files on a memory-backed filesystem like /dev/shm/ on Linux. See this article for several more options: https://luis-sena.medium.com/sharing-big-numpy-arrays-across-python-processes-abf0dc2a0ab2 > My file sizes will exceed memory, so I'm hoping to get the most out of > memmap. Will this in-loop assignment to predsum work to avoid loading all > to memory? > > predsum = np.lib.format.open_memmap(outfile, mode='w+', > shape=(ids_sq,), dtype=np.float32) > > for i in range(len(IN_FILES)): > > pred = numpy.lib.format.open_memmap(IN_FILES[i]) > > predsum = np.add(predsum, pred) ################# <- > This will replace the `predsum` array with a new in-memory array the first time through this loop. Use `out=predsum` to make sure that the output goes into the memory-mapped array np.add(predsum, pred, out=predsum) Or the usual augmented assignment: predsum += pred > del pred > del predsum > The precise memory behavior will depend on your OS's virtual memory configuration. But in general, `np.add()` will go through the arrays in order, causing the virtual memory system to page in memory pages as they are accessed for reading or writing, and page out the old ones to make room for the new pages. Linux, in my experience, isn't always the best at managing that backlog of old pages, especially if you have multiple processes doing similar kinds of things (in the past, I have seen *each* of those processes trying to use *all* of the main memory for their backlog of old pages), but there are configuration tweaks that you can make. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com