n Mittwoch 06 Januar 2010, Ying Wai (Daniel) Fan wrote:
> > Now in your situation there's a failure when reactivating the context to
> > detach from it, probably because the runtime is meddling about. The only
> > reason why cuCtxPushCurrent would throw an "invalid value", is, IMO, if
> > that context is already somewhere in the context stack. So it's likely
> > that the runtime reactivated the context. In current git, a failure of
> > the PushCurrent call should not cause a failure any more--it will print a
> > warning instead.
> 
> I believe different contexts can't share variables that is on the GPUs.

True.

> I can use GPUArray objects as arguments to my fft functions, and these
> objects still exist after fft. So I think fft is using the same context
> as pycuda.

Right. I take it that if runtime functions execute when a driver context
exists, they'll reuse that context.

> I made the change indicated in the attached diff file, such that
> context.synchronize() and context.detach() would print out the context
> stack size, and detach() would also print out whether current context is
> active. With this I verify that the stack size is 1 before and after
> running fft code and the context does not change.

I should clarify here. CUDA operates one context stack, and PyCUDA has
another one. CUDA's isn't sufficient because it will not let the same
context be activated twice. PyCUDA on the other hand needs exactly this
functionality, to ensure that cleanup can happen whenever it is needed.
Hence PyCUDA maintains its own context stack, and keeps CUDA's stack
at most one deep. You are looking at the PyCUDA stack.

> My guess is that CUFFT make some change to the current context, such
> that once this context is poped, it is automatically destroyed.

I disagree. I think CUDA somehow lets us pop the context, does not
report an error, but also does not actually pop the context (since the
runtime is still talking to it). Then, when PyCUDA tries to push the
context back onto CUDA's stack to detach from it, that fails. I've filed
a bug report with Nvidia, we'll see what they say.

> If my
> guess is correct, then calling context.detach() would destroy the
> context, since its usage count drops to 0, and it could circumvent the
> warning message when the context destructor is called.

Pop only removes the context from the context stack (and hence
deactivates it), but retains a reference to it. It should not cause
anything to be destoyed.

> I don't want
> people using my package to see warning message when Python exit, so I am
> not using autoinit in my package, but to create a context explicitly.

Sorry for this mess--I hope we can sort it out somehow.

> The following is kind of unrelated. I have done some experiments with
> contexts. I think context.pop() always pops up the top context from the
> stack, disregarding whether context is really at the top of the stack.
> E.g. I create two contexts c1 and c2 and then I can do c1.pop() twice
> without getting error.

This points to a doc and behavior bug in PyCUDA. Context.pop() should
have been static. It effectively was, but not quite. Fixed in git.

> cuComplex.h exists since CUDA 2.1 and it hasn't changed in subsequent
> version. cuComplex.h is used by cufft.h and cublas.h. I can't find any
> documentation to it. A quick search on google shows that JCuda seems to
> be using it.
> http://www.jcuda.org/jcuda/jcublas/doc/jcuda/jcublas/cuComplex.html

Hmm. A quick poke comes up with an error message:

8< ----------------------------------------------------------
kernel.cu(7): error: no operator "*" matches these operands
            operand types are: cuComplex * cuComplex
8< ----------------------------------------------------------

Code attached. This might not be what we're looking for.

> Maybe we can simply use complex.h from GNU C library. A quick seach on
> my Ubuntu machine locates the following files:
> /usr/include/complex.h, which includes
> /usr/include/c++/4.4/complex.h, which then includes
> /usr/include/c++/4.4/ccomplex, which in turn includes
> /usr/include/c++/4.4/complex, which includes overloading of operators
> for complex number.

Two words: Windows portability. :) Aside from that, this is unlikely to
work, as the system-wide complex header depends on I/O being available
and all kinds of other system-dependent funniness.

> Good luck on your PhD.

Likewise!

Andreas
import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
import numpy
import numpy.linalg as la
from pycuda.compiler import SourceModule

mod = SourceModule("""
#include <cuComplex.h>
__global__ void multiply_them(cuComplex *dest, cuComplex *a, cuComplex *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

multiply_them = mod.get_function("multiply_them")

a = numpy.random.randn(400).astype(numpy.complex64)
b = numpy.random.randn(400).astype(numpy.complex64)

dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1))

print dest-a*b

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to