Re: [Numpy-discussion] Passing numpy arrays to matlab
Andrew Straw wrote: David Cournapeau wrote: - To send data from the calling process to matlab, you first have to create a mxArray, which is the basic matlab handler of a matlab array, and populating it. Using mxArray is very ackward : you cannot create mxArray from existing data, you have to copy data to them, etc... My understanding, never having done it, but from reading the docs, is that you can create a hybrid array where you manage the memory. Thus, you can create an mxArray from existing data. However, the docs basically say that this is too hard for most mortals (and they may well be right -- too painful for me, anyway)! Ok, I have looked at it. It is not hard, it is just totally brain damaged: there is no way to destroy a mxArray without destroying the data it is holding, even after a call with mxSetPr. So the data referenced by the pointer given to mxSetPr is always destroyed by mxDestroyArray; I don't see any way to use this to avoid copy... They could at least have given a function which frees the data buffer and one which destroys the other stuff; as it is, it is totally useless, unless you don't mind memory leaks. David - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
David Cournapeau wrote: Andrew Straw wrote: David Cournapeau wrote: - To send data from the calling process to matlab, you first have to create a mxArray, which is the basic matlab handler of a matlab array, and populating it. Using mxArray is very ackward : you cannot create mxArray from existing data, you have to copy data to them, etc... My understanding, never having done it, but from reading the docs, is that you can create a hybrid array where you manage the memory. Thus, you can create an mxArray from existing data. However, the docs basically say that this is too hard for most mortals (and they may well be right -- too painful for me, anyway)! Ok, I have looked at it. It is not hard, it is just totally brain damaged: there is no way to destroy a mxArray without destroying the data it is holding, even after a call with mxSetPr. So the data referenced by the pointer given to mxSetPr is always destroyed by mxDestroyArray; I don't see any way to use this to avoid copy... They could at least have given a function which frees the data buffer and one which destroys the other stuff; as it is, it is totally useless, unless you don't mind memory leaks. It does sound brain damaged, I agree. But here's a suggestion: can you keep a pool of unused mxArrays rather than calling mxDestroyArray? I guess without the payload, they're just a few bytes and shouldn't take up that much space. - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Josh Marshall wrote: I don't see how you are going to get around doing the copies. Matlab is in a separate process from the Python interpreter, and there is no shared memory. In what way do you want these proxy classes to look like numpy arrays? I am not talking about the copy in the matlab - python interaction. This is done through pipe, handled by the OS; I don't know the details, but I know that communication through pipe is quite fast under linux (see below), and is not the bottleneck. Note that mlabwrap creates proxy arrays, and only copies the data if you actually request it to. (AFAIRemember) Otherwise you aren't losing any speed, because there aren't going to be any copies. There may be no copy for returned data you don't need, but that's not the case I am talking about. For all other cases, I don't think this is what's happening: if you take a look at mlabwrap, in the C mlabraw module, the function mlabraw_put always calls numeric2mx for arrays, which itself always calls makeMxFromNumeric, which makes a copy. Same in the other direction once you call mlabwrap_get. I am doing the same in my module, because that's the simplest thing to do. The problem is that when you are using the function engPutVariable of the matlab engine API, you need to give a pointer to a mxArray structure, which is the C representation of a matlab array. You cannot say (this is one of the brain damaged thing of matlab C api I was talking in an other mail): build a mxArray from existing data: this is the copy I am talking about, and this is one expensive. In the best case (real numpy arrays with fortran storage), you can do a memcpy, but in most cases, you need to do something which takes strides into account (because complex matlab arrays are actually not fortran, or because by default, most numpy arrays are C storage, and this makes a difference for rank = 2), which implies non-contiguous memory access, which is *really* expensive (around 2 cycles/byte at best, on my bi Xeon 3.2 Ghz). Basically, if you want to do something like calling the resample function of matlab on an numpy array and using the result later in numpy, here is what's happening right now: 1 copy numpy (or numarray in the case of mlabwrap, but this should not matter, I guess) data into an mxArray 2 send the mxArray to matlab engine: done with pipe (imply copy ? At least, it is contiguous array copy) 3 compute the thing into matlab 4 send the result to python mxArray 5 copy the data of the mxArray to numpy array A quick profiling show that if you don't do any processing in matlab, just sending and getting an array back, 1 and 5 takes roughly 80-90 % of the time in my implementation (which is faster than mlabwrap, but I think this is just caused by the much fancier API of mlabwrap, ie the core mecanism to pass arrays should be roughly the same, as mlabwrap uses the C function makeMxFromNumeric, and I am using a similar function myself through ctypes), the 10-20% are used for the communication through the pipe. I believe that most typical usage cases involve 1 and 5. 5 should be avoidable in many cases if I know how to build a proxy class around the mxArray so that the the proxy behaves as a numpy array, with the buffer owned by the mxArray; but I don't know how to do that (particularly, how to handle the destruction of data, as the proxy should destroy the mxArray once the proxy object is garbage collected). 1 would be easy if the C matlab API was sane, which is not the case; they give functions which are impossible to use correctly (mxSetPr and mxSetData). What could be possible to do is add an array interface to the mlabwrap proxy classes so they can be used as numpy arrays when required for passing to numpy functions (or PIL, etc). Thus we only copy when we want to use numpy functions. Then we could define the operators on the proxy class to perform their operations on the other side of the bridge. Yes, that's what I want to do, and in theory, this should be possible without copy; my initial question in the beginning of the thread is how to build a numpy proxy class from existing buffer of data, with the proxy becoming the owner of the data (ie should do all the deallocation, including here cleaning mxArray structures). cheers, David - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Hi, Thank you very much, I think this added documentation is pretty recent; I have never seen it before, and I did a lot a mex programming at some point... This whole mxarray nonsense reminds me why I gave up on matlab :), I would be very happy to help with this. It would be great if we could get a standard well-maintained library of some sort towards scipy - we (http://neuroimaging.scipy.org/) have a great deal of matlab integration to do. Best, Matthew - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Hi all, ti, 2006-11-07 kello 11:23 +0900, David Cournapeau kirjoitti: I am trying to find a nice way to communicate between matlab and python. I am aware of pymat, which does that, but the code is deprecated, and I thing basing the code on ctypes would lead to much more robust code. http://claymore.engineer.gvsu.edu/%7Esteriana/Software/pymat.html I have a really simple prototype which can send and get back data from matlab, but I was wondering if it would be possible to use a scheme similar to ctypes instead of having to convert it by hand. A while ago I wrote a mex extension to embed the Python interpreter inside Matlab: http://www.iki.fi/pav/pythoncall I guess it's something like an inverse of pymat :) But I guess this is not really what you are looking for, since at present it just does a memory copy when passing arrays between Matlab and Python. Though, shared arrays might be just possible to implement if memory management is done carefully. BR, Pauli Virtanen signature.asc Description: Digitaalisesti allekirjoitettu viestin osa - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Pauli Virtanen wrote: Hi all, ti, 2006-11-07 kello 11:23 +0900, David Cournapeau kirjoitti: I am trying to find a nice way to communicate between matlab and python. I am aware of pymat, which does that, but the code is deprecated, and I thing basing the code on ctypes would lead to much more robust code. http://claymore.engineer.gvsu.edu/%7Esteriana/Software/pymat.html I have a really simple prototype which can send and get back data from matlab, but I was wondering if it would be possible to use a scheme similar to ctypes instead of having to convert it by hand. A while ago I wrote a mex extension to embed the Python interpreter inside Matlab: http://www.iki.fi/pav/pythoncall I guess it's something like an inverse of pymat :) Yes, but at the end, I think they enable similar things. Thanks for the link ! But I guess this is not really what you are looking for, since at present it just does a memory copy when passing arrays between Matlab and Python. Though, shared arrays might be just possible to implement if memory management is done carefully. In my case, it is much worse: 1 first, you have numpy data that you have to copy to mxArray, the structure representing arrays in matlab C api. 2 then when you send data to the matlab engine, this is done automatically through pipe by the matlab engine API (maybe pipe does not imply copying; I don't know much about pipe from a programming point of view, actually) 3 The arrays you get back from matlab are in matlab mxArray structures: right now, I copy their data to new numpy arrays. At first, I just developed a prototype without thinking too much, and the result was much slower than I thought: sending a numpy with 2e5x10 double takes around 100 ms on my quite powerful machine (around 14 cycles per item for the best case). I suspect it is because I copy memory in a non contiguous manner (matlab arrays have a internal F storage for real arrays, but complex arrays are really two different arrays, which is different than Fortran convention I think, making the copy cost really expensive for complex arrays). To see if I was doing something wrong, I compared with numpy.require(ar, requirements = 'F_CONTIGUOUS'), which is even much slower There is not much I can do about 2, it looks like there is a way to avoid copying for 1, and my question was more specific to 3 (but reusable in 1, maybe, if I am smart enough). Basically: * how to create an object which has the same interface than numpy arrays, but owns the data from a foreign structure, which data are availble when building the object (The idea was to create a class which implements the array interface from python, kind of proxy class, which owns the data from mxArray; owns here is from a memory management point of view). David - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Josh Marshall wrote: Hi David, Did you have a look at mlabwrap? It's quite hard to find on the net, which is a shame, since it is a much more up to date version, enhancing pymat with the things that you are trying to do. It allows passing arrays and getting arrays back. http://mlabwrap.sourceforge.net/ I didn't know that, thanks. Unfortunately, it is not really what I am trying to do: mlabwrap is just a python interface a bit more high level than pymat, with many fancy tricks, but still do copies. What I would like is to avoid completely the copying by using proxy classes around data from numpy so that I can pass automatically numpy arrays to matlab C api, and a proxy class around data from matlab so that they look like numpy arrays. I don't care that much about the actual api from python point of view, because I intend to use this mainly to compare matlab vs numpy implementation, not as a way to use matlab inside python regularly. And once the copy problem is solved, adding syntactic sugar using python is easy anyway, I think (it should be easy to do something similar to mlabwrap at that point), cheers, David - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
David Cournapeau wrote: - To send data from the calling process to matlab, you first have to create a mxArray, which is the basic matlab handler of a matlab array, and populating it. Using mxArray is very ackward : you cannot create mxArray from existing data, you have to copy data to them, etc... My understanding, never having done it, but from reading the docs, is that you can create a hybrid array where you manage the memory. Thus, you can create an mxArray from existing data. However, the docs basically say that this is too hard for most mortals (and they may well be right -- too painful for me, anyway)! - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
Andrew Straw wrote: David Cournapeau wrote: - To send data from the calling process to matlab, you first have to create a mxArray, which is the basic matlab handler of a matlab array, and populating it. Using mxArray is very ackward : you cannot create mxArray from existing data, you have to copy data to them, etc... My understanding, never having done it, but from reading the docs, is that you can create a hybrid array where you manage the memory. Thus, you can create an mxArray from existing data. However, the docs basically say that this is too hard for most mortals (and they may well be right -- too painful for me, anyway)! Would you mind telling me where you found that information ? Because right now, I am wasting a lot of cycles because of memory copy in both directions, and it is sometimes slow enough so that it is annoying, cheers, David - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Passing numpy arrays to matlab
David Cournapeau wrote: Andrew Straw wrote: David Cournapeau wrote: - To send data from the calling process to matlab, you first have to create a mxArray, which is the basic matlab handler of a matlab array, and populating it. Using mxArray is very ackward : you cannot create mxArray from existing data, you have to copy data to them, etc... My understanding, never having done it, but from reading the docs, is that you can create a hybrid array where you manage the memory. Thus, you can create an mxArray from existing data. However, the docs basically say that this is too hard for most mortals (and they may well be right -- too painful for me, anyway)! Would you mind telling me where you found that information ? Because right now, I am wasting a lot of cycles because of memory copy in both directions, and it is sometimes slow enough so that it is annoying, I found it reading through the in-program help (the C-API section, whatever it's called) on a Matlab installation at my university. I guess this was Matlab 2006A. A quick Google search turns this up: http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_external/index.html?/access/helpdesk/help/techdoc/matlab_external/f25255.html They give the following example, which seems to create a Matlab array pArray with data owned by the C variable data: mxArray *pArray = mxCreateDoubleMatrix(0, 0, mxREAL); double data[10]; mxSetPr(pArray, data); mxSetM(pArray, 1); mxSetN(pArray, 10); - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion