On Sat, Sep 22, 2012 at 4:46 PM, Olivier Grisel <[email protected]> wrote: > There is also a third use case that is problematic on numpy master: > > orig = np.memmap('tmp.mmap', dtype=np.float64, shape=100, mode='w+') > orig[:] = np.arange(orig.shape[0]) * -1.0 # negative markers to > detect under / overflows > > a = np.memmap('tmp.mmap', dtype=np.float64, shape=50, mode='r+', offset=16) > a[:] = np.arange(50) > b = np.asarray(a[10:]) > > Now b does not even have a 'filename' attribute anymore. `b.base` is a > python mmap instance but the later is created with a file descriptor. > > It would still be possible to use: > > from _multiprocessing import address_of_buffer > > to find the memory address of the mmap buffer and use than to open new > buffer views on the same memory segment from subprocesses using > `numpy.frombuffer((ctypes.c_byte * n_byte).fromaddress(addr))` but in > case of failure (e.g. the file has been deleted on the HDD) one gets a > segmentation fault instead of a much more userfriendly catchable file > not found exception.
On Unix, if the processes are related in a way that lets this work, then this would actually be a far better solution... it will always refer to the same file that was opened in the parent, even if it's has since been deleted or renamed or replaced by a different file. (And if they aren't related by fork(), then sending the fd would be better than sending the filename, for the same reason.) Of course that doesn't help for Windows; no idea what happens there. Numpy in general really does not provide any reliable way of tracking the relationship between different views of the same buffer. Introspecting on .base will work in many cases, but it's not guaranteed to even in earlier versions. Maybe you don't care because it works well enough but it's an inherently rickety design :-). Trying to think of the correct solution here, I think it would have to be something like... have the numpy mmap code keep a global scorecard of all extant memory mappings -- filename, offset, length, memory address. And then when you want to do an "mmap aware pickle", you check the address of the array you're trying to save to see if it falls into an mmap'ed region. That'd be simpler and more reliable than anything involving base tracking. -n _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
