On Thu, Mar 11, 2010 at 02:26:49PM +0100, Francesc Alted wrote: > > I believe that your above assertion is 'half' right. First I think that > > it is not SWAP that the memapped file uses, but the original disk space, > > thus you avoid running out of SWAP. Second, if you open several times the > > same data without memmapping, I believe that it will be duplicated in > > memory. On the other hand, when you memapping, it is not duplicated, thus > > if you are running several processing jobs on the same data, you save > > memory. I am very much in this case.
> Mmh, this is not my experience. During the past month, I was proposing in a > course the students to compare the memory consumption of numpy.memmap and > tables.Expr (a module for performing out-of-memory computations in PyTables). > [snip] > So, in my experience, numpy.memmap is really using that large chunk of memory > (unless my testbed is badly programmed, in which case I'd be grateful if you > can point out what's wrong). OK, so what you are saying is that my assertion #1 was wrong. Fair enough, as I was writing it I was thinking that I had no hard fact to back it. How about assertion #2? I can think only of this 'story' to explain why I can run parallel computation when I use memmap that blow up if I don't use memmap. Also, could it be that the memmap mode changes things? I use only the 'r' mode, which is read-only. This is all very interesting, and you have much more insights on these problems than me. Would you be interested in coming to Euroscipy in Paris to give a 1 or 2 hours long tutorial on memory and IO problems and how you address them with Pytables? It would be absolutely thrilling. I must warn that I am afraid that we won't be able to pay for your trip, though, as I want to keep the price of the conference low. Best, Gaël _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
