2012/9/23 Nathaniel Smith <[email protected]>: > On Sat, Sep 22, 2012 at 4:46 PM, Olivier Grisel > <[email protected]> wrote: >> There is also a third use case that is problematic on numpy master: >> >> orig = np.memmap('tmp.mmap', dtype=np.float64, shape=100, mode='w+') >> orig[:] = np.arange(orig.shape[0]) * -1.0 # negative markers to >> detect under / overflows >> >> a = np.memmap('tmp.mmap', dtype=np.float64, shape=50, mode='r+', offset=16) >> a[:] = np.arange(50) >> b = np.asarray(a[10:]) >> >> Now b does not even have a 'filename' attribute anymore. `b.base` is a >> python mmap instance but the later is created with a file descriptor. >> >> It would still be possible to use: >> >> from _multiprocessing import address_of_buffer >> >> to find the memory address of the mmap buffer and use than to open new >> buffer views on the same memory segment from subprocesses using >> `numpy.frombuffer((ctypes.c_byte * n_byte).fromaddress(addr))` but in >> case of failure (e.g. the file has been deleted on the HDD) one gets a >> segmentation fault instead of a much more userfriendly catchable file >> not found exception. > > On Unix, if the processes are related in a way that lets this work, > then this would actually be a far better solution... it will always > refer to the same file that was opened in the parent, even if it's has > since been deleted or renamed or replaced by a different file. (And if > they aren't related by fork(), then sending the fd would be better > than sending the filename, for the same reason.) Of course that > doesn't help for Windows; no idea what happens there. > > Numpy in general really does not provide any reliable way of tracking > the relationship between different views of the same buffer. > Introspecting on .base will work in many cases, but it's not > guaranteed to even in earlier versions. Maybe you don't care because > it works well enough but it's an inherently rickety design :-). Trying > to think of the correct solution here, I think it would have to be > something like... have the numpy mmap code keep a global scorecard of > all extant memory mappings -- filename, offset, length, memory > address. And then when you want to do an "mmap aware pickle", you > check the address of the array you're trying to save to see if it > falls into an mmap'ed region. That'd be simpler and more reliable than > anything involving base tracking.
Well, base tracking seems to work really well on 1.6.2. Here is the code that does the introspection / reconstruction of shared memory views from sub-process using the python multiprocessing Pool API: https://github.com/joblib/joblib/pull/44/files#L5R55 The only clean solution for the collapsed base of numpy 1.7 I see would be to replace the direct mmap.mmap buffer instance from the numpy.memmap class to use a custom wrapper of mmap.mmap that would still implement the buffer python API but would also store the filename and offset as additional attributes. To me that sounds like a much cleaner than a "global scorecard of all extant memory mappings". -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
