Hi, I've built a system which allocates numpy arrays and processes them in C++ code (this is because I'm building a native code module using boost.python and it makes sense to use numpy data storage to then deal with outputs in python, without having to do any copying). Everything seems fine except when I parallelise the main loop, (openmp and TBB give the same results) in which case I see a whole bunch of messages saying
"reference count error detected: an attempt was made to deallocate 12 (d)" sometimes during the running of the program, sometimes all at the end (presumably when all the destructors in my program run). To clarify, the loop I am now running parallel takes read-only parameters (enforced by the C++ compiler using 'const') and as far as I can tell there are no race conditions with multiple threads writing to the same numpy arrays at once or anything obvious like that. I recompiled numpy (I'm using 1.6.1 from the official git repository) to print out some extra information with the reference count message, namely a pointer to the thing which is being erroneously deallocated. Surprisingly, it is always the same address for any run of the program, considering this is a message printed out hundreds of times. I've looked into this a little with GDB and as far as I can see the object which the message pertains to is an "array descriptor", or at least that's what I conclude from backtraces similar to the following: Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 1501 fprintf(stderr, "*** Reference count error detected: \n" \ (gdb) bt #0 arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501 #1 0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271 #2 0x0000000103e592d7 in boost::detail::sp_counted_impl_p<garf::multivariate_normal<double> const>::dispose (this=<value temporarily unavailable, due to optimizations>) at refcount.hpp:36 #3 .... my code Obviously I can turn off the parallelism to make this problem go away, but since my underlying algorithm is trivially parallelisable I was counting on being able to achieve linear speedup across cores.. Currently I can, and as far as I know there are no actual incorrect results being produced by the program. However, in my field (Machine Learning) it's difficult enough to know whether the numbers calculated are sensible even without the presence of these kind of warnings, so I'd like to get a handle on at least why this is happening so I'd know know whether I can safely ignore it. My guess at what might be happening is that the multiple threads are dealing with some object concurrently and the updates to the reference count are not processed atomically, meaning that there are too many DECREFs which happen later on. I had presumed that allocated different numpy matrices in different threads, and then all reading from central numpy matrices would work fine, but apparently there is something I missed, pertaining to descriptors.. Can anyone offer any guidance, or at least tell me this is safe to ignore? I can reproduce the problem reliably, so if you need me to do some digging with GDB at the point the error takes place I can do that. Many thanks, Malcolm _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion