Re: [PyCUDA] nvcc is not searched in cuda root dir.

2010-05-10 Thread Andreas Klöckner
On Sonntag 09 Mai 2010, Ilya Gluhovsky wrote:
 On Sonntag 04 Oktober 2009, Michal wrote:
  Hi,
  during pycuda configuration it is possible to specify cuda root dir. I
  think that pycuda should add cuda_root_dir/bin to its PATH, so I
  wouldn't get errors like :
  OSError: nvcc was not found (is it on the PATH?) [[Errno 2] No such
  file or directory]

 So what the work-around?  Thanks so much!  Ilya.

Having nvcc on the PATH should fix things.

Andreas

PS: Please keep the list cc'd. Thanks.



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


[PyCUDA] Windows binaries

2010-05-09 Thread Andreas Klöckner
Hi team,

I just discovered that Christoph Gohlke at UC Irvine distributes Windows
binaries for PyCUDA, here:

http://www.lfd.uci.edu/~gohlke/pythonlibs/

This looks like a good page to keep bookmarked if you're on Windows,
though obviously I don't know how well the packages on that page actually
work. :) I've also added this link to the Wiki.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] Mailing list move

2010-05-06 Thread Andreas Klöckner
On Donnerstag 06 Mai 2010, Andreas Klöckner wrote:
 Hi all,
 
 just a quick heads-up that I will be moving the PyCUDA list to a
 different server today. There might be a short period where the list is
 unavailable, but I'll try to keep this minimal. All should be back to
 normal by tonight at the latest. If you notice breakage after that,
 please let me know.

The move is done, and everything should be back to working order. DNS
changes might still not have propagated everywhere, but should do so
soon. Let me know if you notice any issues.

Thanks for your patience,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] Why is cumath slower than an ElementwiseKernel? Is data copied back after each operation?

2010-05-06 Thread Andreas Klöckner
On Donnerstag 06 Mai 2010, Ian Ozsvald wrote:
 I've been speed testing some code to understand the complexity/speed
 trade-off of various approaches. I want to offer my colleagues the
 easiest way to use a GPU to get a decent speed-up without forcing
 anyone to write C-like code if possible.

A few comments:

- If you use 100 floats, you only measure launch overhead. The GPU
  processing time will be entirely negligible. You need a couple million
  entries to generate even a small bit of load.

- You might want to warm up each execution path before you take your
  timings, to account for code being compiled or fetched from disk.

HTH,
Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] Are sin/log supported for complex number (0.94rc)? Odd results...

2010-05-06 Thread Andreas Klöckner
On Montag 19 April 2010, Ian Ozsvald wrote:
 I find myself out of my depth again. I'm playing with complex numbers
 using 0.94rc (on Windows XP with CUDA 2.3). I've successfully used
 simple operations (addition, multiplication) on complex numbers, that
 resulted in the Mandelbrot example in the wiki.
 
 Now I'm trying sin and log and I'm getting errors and one positive
 result (see the end). I'm not sure if these functions are supported
 yet? If anyone can point me in the right direction (and perhaps
 suggest what needs implementing) then I'll have a go at fixing the
 problem.
 
 For reference, you can do the following using numpy on the CPU for
 verification: In [117]: numpy.sin(numpy.array(numpy.complex64(1-1j)))
 Out[117]: (1.2984576-0.63496387j)
 
 Here are two pieces of test code. At first I used multiplication
 (rather than sin()) to confirm that the code ran as expected.
 
 Next I tried creating a simple complex64 array, passing it to the GPU
 and then asking for pycuda.cumath.sin() - this results in Error:
 External calls are not supported. Here's the code and error:

Fixed/added in git, both for cumath and elwise.

 I replaced:
 pycuda.cumath.sin(a_gpu) # should produce (1.29-0.63j)
 with:
 pycuda.cumath.log(a_gpu) # should produce (0.34-0.78j)
 and the code ran without an error...but produced the wrong result. It
 generates (1-1j) which looks like a no-op.

cumath ops aren't in-place--you need to print the returned value.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] pycuda and cuda 64 bits on mac SL

2010-05-06 Thread Andreas Klöckner
On Montag 12 April 2010, Alan wrote:
 Hi there,
 
 Finally nVidia released Cuda (partially) 3.0 in 64 bits for Mac (not beta
 version!)

Hi Alan, all,

has there been progress on this? Has anyone gotten 64-bit PyCUDA to work
on Snow Leopard? If not, any idea what might be wrong?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] Debian packaging of PyCUDA

2010-05-05 Thread Andreas Klöckner
On Dienstag 04 Mai 2010, Tomasz Rybak wrote:
 Hello,
 I have begun creating Debian PyCUDA (0.94 from GIT) package.
 It is my first Debian package and I was cheating
 by looking at other Python modules, but it seems to work,
 at least on my machine.
 In few days I should be able to check on another machine
 with Debian and CUDA, and will report if there are some problems.
 
 Now I am wondering whether there is need to integrate Debian
 packaging metadata into PyCUDA (it is directory with few text
 files in it) and if so what is the best way to do it.

This is a great idea, ideally with the outcome that you would just
apt-get install python-pycuda and get a working setup.

Any DDs around? Anyone have spare cycles to make the packages?
There's also an ITP for PyCUDA:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=546919
nvcc is apparently not packaged yet, either. Also the nvidia-glx
packages tend to lag by a version or so. (Currently they have 190.53.)

If we decide to go the home-rolled route:

I remember that a while back there seemed to be a debate by the debian
guys on whether a debian/ directory in upstream source is good or bad.
[1] Does anyone know what the resolution of that is by now? This would
determine whether I'd stick debian/ files into git.

[1] 
http://kitenet.net/~joey/blog/entry/on_debian_directories_in_upstream_tarballs/

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] creating a new view of a GPUArray

2010-05-05 Thread Andreas Klöckner
On Donnerstag 29 April 2010, Amir wrote:
 I would like to create different views of a GPUArray. I have no idea how to
 do it.

1D slicing logic already exists. See GPUArray.__getitem__ on how it's
done.

 Let A be a 2D float32 c-continuous array. How do I create a 1D view of one
 of its rows? I am not sure how to set gpudata in the view. An example would
 be great. Is there a pointer arithmetic trick?

This would have to happen along the lines of how the view is built
above, but is hampered by the fact that our kernels don't yet have the
necessary stride support. (I.e. you could already implement contiguous
slices without touching the kernels, but non-contiguous ones need more
work.)

 Is it possible to create a float32 view of a complex64 array?

Yup, that should be easy. Just follow GPUArray.__getitem__. Is there a
conventional numpy spelling for this?

 I am trying to use gpuarray.multi_take_put to copy slices between
 float and complex arrays but it requires dtypes to match.

multi_take_put, in addition to being an undocumented feature, needs
index arrays and is therefore likely overkill for what you want.

 Is using gpuarray.multi_take_put the best way to copy a slice of an linear
 array?

No--'array[5:17].copy()' should be way faster, if you need a copy at
all.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] How to manually free GPUarray to avoid leak?

2010-05-05 Thread Andreas Klöckner
On Sonntag 25 April 2010, gerald wrong wrote:
 Can I manually free GPUarray instances? 

In addition to Bogdan's comments (which are more likely to help you with
what you're seeing): If you must free the memory by hand, you can use

ary.gpudata.free()

to do so.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] 32-bit PyCUDA on Snow Leopard

2010-05-05 Thread Andreas Klöckner
Hi Per, all,

On Dienstag 20 April 2010, Per B. Sederberg wrote:
 Although it is not ideal and it took me many hours to figure out (as
 opposed to the 5 minutes it takes on Debian), I've been able to get
 PyCUDA and CUDA 3.0 working with 32-bit Enthought Python on Snow
 Leopard.

thanks very much for these instructions. I've turned them into a Wiki
page for easier community maintenance:

http://wiki.tiker.net/PyCuda/Installation/Mac/EnthoughtPython

Hope that's ok with you.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] cuBLAS on gpuarray

2010-04-12 Thread Andreas Klöckner
On Montag 12 April 2010, Bryan Catanzaro wrote:
 The only difference here is that at exit, we're detaching from the
 cuda context instead of popping it as pycuda.autoinit does.  That gets
 rid of the error, although it's probably not the correct solution to
 the problem.

detach() is not right if you're still holding references to stuff in the
context you're detaching from.

It's a tiny bit hairy that, now that (I think) we've finally got the
whole context stack thing working, Nvidia prohibits it if you want
CUBLAS. I'll think of a proper workaround--right now I'm thinking about
tainting the current context once any runtime stuff gets called in it,
and then preventing any push/pops.

While we're at it: Team, what's the plan for bringing CUBLAS/CUFFT
wrappers into PyCUDA? Bryan?

Another thing: If I seem slow to respond at the moment, it's because I'm
finishing my thesis and defending my PhD, hopefully I'll be all done by
Apr 28. Wish me luck!

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] installed_path incorrect in pycuda/compiler.py

2010-04-01 Thread Andreas Klöckner
On Donnerstag 01 April 2010, MinRK wrote:
 The `installed_path' variable in `_find_pycuda_include_path' in
 pycuda/compiler.py appears to be incorrect, or at least not
 sufficiently general, because it does not find the install location on
 my machines (OSX 10.6/Python 2.6.1 and Ubuntu 9.10/Python 2.6.4).
 
 $ python setup.py install --prefix=$HOME/usr/local
 
 installs pycuda to ~/usr/local/lib/python2.6/site-packages/pycuda
 and the pycuda headers to ~/usr/local/include
 
 but the installed_path variable is defined with:
 
 installed_path = join(pathname, .., include, pycuda)
 
 which points to:
 ~/usr/local/lib/python2.6/site-packages/include/pycuda, which is
 incorrect.
 adding three more .. fixes the location:
 
 installed_path = join(pathname, .., .., .., .., include,
  pycuda)
 
 and everything works once I add that patch.

First of all, thanks for reporting this. I've added your patch to git.
But really--this must be some sick joke, right? Currently PyCUDA looks
in no fewer than *seven* places (on Linux) to figure out where its
headers got installed.  I hope Tarek (Ziadé) gets done cleaning up
distutils rather sooner than later... or is this all the distributors'
doing?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Trim down boost library?

2010-03-30 Thread Andreas Klöckner
On Dienstag 30 März 2010, reckoner wrote:
 Hi,
 
 I built the boost 1.38 libraries from source following the instructions
 on the wiki, but this generated about 5 GB of material.
 
 Do I need all of it, or can I trim this down?

These here are the boost headers that PyCUDA includes.

boost/cstdint.hpp
boost/thread/thread.hpp
boost/thread/tss.hpp
boost/version.hpp
boost/ptr_container/ptr_map.hpp
boost/format.hpp
boost/shared_ptr.hpp
boost/python.hpp
boost/python/stl_iterator.hpp
boost/foreach.hpp

(This implicitly tells you which Boost libraries are used, and you can
probably infer what you need to keep and what to throw away. Debian's
package management does a pretty good job of modularizing boost.

You might also want to try
http://www.boost.org/doc/libs/1_42_0/tools/bcp/doc/html/index.html

Disclaimer: I haven't played with this.

I realize boost is a big dependency, but you will find that its
individual parts are actually quite small--it just includes everything
and the kitchen sink [1]. So this is mainly a distribution problem IMO.
Also I stand behind the choice of boost as a supporting library--it's
quality software.

Andreas

[1] http://www.boost.org/doc/libs/1_42_0/libs/libraries.htm


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


[PyCUDA] [ANN] 0.94rc -- please test

2010-03-28 Thread Andreas Klöckner
Hi all,

PyCUDA's present release version (0.93) is starting to show its age, and
so I've just rolled a release candidate for 0.94, after tying up a few
loose ends--such as complete CUDA 3.0 support.

Please help make sure 0.94 is solid. Go to
http://pypi.python.org/pypi/pycuda/0.94rc
to download the package, see if it works for you, and report back.

The change log for 0.94 is here:
http://documen.tician.de/pycuda/misc.html#version-0-94
but the big-ticket things in this release are:
- Support for CUDA 3.0
- Sparse matrices
- Complex numbers

Let's make this another another rockin' release!

Thanks very much for your help,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] E LogicError: cuModuleLoadDataEx failed: invalid image - with test_driver.py

2010-03-27 Thread Andreas Klöckner
On Sonntag 28 März 2010, Catalin Patulea wrote:
 Sorry to butt in..
 
 Reckoner, can you try again after applying the attached patch? It
 should address the invalid image errors.
 
 Catalin
 

Good point! Thanks for the patch, applied to git master.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Problems with Context stack autoinit

2010-03-26 Thread Andreas Klöckner
On Freitag 26 März 2010, Bryan Catanzaro wrote:
 I've attached the trace.  Lines beginning with --- are added
  instrumentation that I put in autoinit.py and cuda.hpp.  Also, my
  workaround has now failed - with some versions of the code the attempt to
  push a bad context happened in device_allocation::free() - and deleting
  objects manually helped with that.  But other times it fails in ~module(),
  and I'm not sure how to bypass that one.

Do you have some short sample code that I could try on Linux?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Problems with Context stack autoinit

2010-03-25 Thread Andreas Klöckner
On Donnerstag 25 März 2010, Bryan Catanzaro wrote:
 Hi All -
 I've been getting problems with the following error:
 
 terminate called after throwing an instance of 'cuda::error'
   what():  cuCtxPushCurrent failed: invalid value
 
 After poking around, I discovered that context.pop(), registered using
  atexit in pycuda.autoinit, is being called *before* all the destructors
  for various things created during my program.  

This is by design. Since destructors may be called on out-of-context
objects, they need to make sure that 'their' context is active anyway.
In your case the context looks to have been *destroyed*, not merely
switched. Can you run your code with CUDA tracing and send the log?
(CUDA_TRACE=3D1 in siteconf.py)

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] RuntimeError: cuInit failed: no device

2010-03-18 Thread Andreas Klöckner
On Donnerstag 18 März 2010, jade mackay wrote:
 I get the following error. Can anyone point me in the right direction to
 resolve this?
 
  import pycuda.autoinit
 
 Traceback (most recent call last):
   File stdin, line 1, in module
   File
 /usr/local/lib/python2.6/dist-packages/pycuda-0.93-py2.6-linux-x86_64.egg/
 pycuda/autoinit.py, line 4, in module
 cuda.init()
 pycuda._driver.RuntimeError: cuInit failed: no device

Permissions on the CUDA devices? /dev/nvidia*

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] pyCuda with python 2.4

2010-03-16 Thread Andreas Klöckner
On Mittwoch 17 März 2010, Daniel Chia wrote:
 HI Andreas,
 
  I did, however I need to define PY_SSIZE_T_MAX to get it to build.
 However I can't test the code as I don't have root access, so it seems I
 can't install pytools, cos it can't patch setuptools.
 
  I might try installing a copy python on my user account thought.

With virtualenv [1] (best with option '--distribute'), you do not need
root rights to install Python packages. (And virtualenv can be used even
if it's not installed systemwide.)

HTH,
Andreas

[1] http://pypi.python.org/pypi/virtualenv




signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] complementary error function erfc

2010-03-15 Thread Andreas Klöckner
On Sonntag 14 März 2010, Faisal Moledina wrote:
 Hello PyCUDA list,
 
 I'm just starting out with PyCUDA and have not used much more than
 gpuarray and cumath. In fact, I have yet to program my own CUDA
 kernel. I'm wondering if there is a built-in erfc method for a
 gpuarray. If PyCUDA doesn't have one built-in, how would I implement a
 CUDA kernel for erfc? Prior to PyCUDA, I was using scipy.special.erfc
 on NumPy arrays.

You could use to pycuda.elementwise to shield you from any actual CUDA
programming--and use [1] as a (high-quality) implementation guideline.
Maybe you could even tweak that header directly (add a few __device__
specs).

[1]
https://svn.boost.org/trac/boost/browser/trunk/boost/math/special_functions/erf.hpp

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Issue running test cases on Windows Vista 64 bit

2010-03-11 Thread Andreas Klöckner
On Donnerstag 11 März 2010, Conway, Nicholas J wrote:
 Installed distribute instead of setuptools and that fixed the quirks during
  installation, but did not fix the problem when running the test.
 
 This made sure pycuda was properly installed in the site-packages
  directory.
 
 I update my paths, and tried restricting the path for cl.exe to
  VC\bin\amd64
 
 But I got this:
 
 Exec:Error: error invoking 'nvcc --cubin -arch sm_11
  -IC:\Python26\lib\site-packages\pycudabeta-py2.6-win-amd64.egg\pycuda\..\i
 nclude\pycuda kernel.cu': status -1 invoking 'nvcc --cubin -arch sm_11
  -IC:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win-amd64.egg\pycuda
 \..\includde\pycuda kernel.cu': nvcc fatal : nvcc cannot find a supported
  cl version. Only MSVC 8.0 and MSVC 9.0 are supported


Does this here help?
http://forums.nvidia.com/index.php?showtopic=73711st=0p=418880#entry418880

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Issue running test cases on Windows Vista 64 bit

2010-03-11 Thread Andreas Klöckner
On Donnerstag 11 März 2010, Conway, Nicholas J wrote:
 Also tried CUDA 3.0 beta with the luck that it ran and crashes python
  during test_driver.py   
 

Did you recompile PyCUDA? Unfortunately, you need to delete the 'build'
directory to be able to rebuild from scratch--distutils is unaware of
dependency changes.

HTH,
Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] pycuda 0.93 - Snow Leopard - error: invalid command 'bdist_egg'

2010-03-08 Thread Andreas Klöckner
On Montag 08 März 2010, Daniel Kubas wrote:
 Hi Andreas,
 
 yes I built the boost library (1.39) with the recommended flag
 
 'architecture=x86'
 
 and even  omitting
 
 '--with-libraries=signals,thread,python'

If you haven't tried this already:
Try poking at PyCUDA's _driver.so with 'otool -L' and then checking that
all shared libraries depended on are actually 32-bit.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] pycuda 0.93 - Snow Leopard - error: invalid command 'bdist_egg'

2010-03-08 Thread Andreas Klöckner
On Montag 08 März 2010, Daniel Kubas wrote:
 Hi,
 
 It works now! 

Glad to hear that.

 (got this trick from
 http://mail.python.org/pipermail/python-list/2009-October/1222481.html)

Bryan also put that trick on
http://wiki.tiker.net/PyCuda/Installation/Mac#Notes_about_Snow_Leopard
a while ago, I think.

 I just hope the 'UserWarning' is not critical and I can finally start
 using Pycuda for my science and find exoplanets with it ;-)

It isn't. If you'd like to not see it, downgrade to Pytools 9 or upgrade
PyCUDA to git master.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Patch for error C2143: syntax error : missing '; ' before 'type' on latest master for MSVC

2010-03-05 Thread Andreas Klöckner
On Freitag 05 März 2010, Ian Ozsvald wrote:
 Ok, I stepped back to my last working master copy from a few days
 back. I downloaded the raw blobs of your new changes via:
 http://git.tiker.net/pycuda.git/commitdiff/c3d5f8178f71271b8689915bc2d1122e
 0f7b1f52 and then recompiled pyCUDA, deleted the temp kernels directory and
  ran my tests. The new code and includes are in site-packages.

Manual git? I'm impressed. :)

And yes, there was an issue with the public git tree. Fixed now. Sorry
about that.

 My SimpleSpeedTest.py and Mandelbrot.py code compiles cleanly and runs
 without errors, I can see it using nvcc to compile new copies of the
 kernels.
 
 On first blush it looks as though your edits have fixed the problem,
 much obliged :-)

Just played with your Mandelbrot thing--very nice! Thanks for sharing.
Regarding default size: On my 260, 1000x1000 is still below a tenth of a
second. :)

 It looks as though the main changes are removing _STLP_DECLSPEC from
 the .hpp and moving the __device__ declaration closer to the function
 definition (moving it from before the return-type declaration to
 after) in the hpp/cpp?

Yup, that's what I did--as I said in an earlier message, apparently the
position of '__device__' matters. It needs to be *before* the return
type.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Int detection in function kernel invocation

2010-03-04 Thread Andreas Klöckner
On Donnerstag 04 März 2010, Fabrizio Milo aka misto wrote:
 Hi,
 
 I had to add the patch in attachment to make work a kernel like
 
 void kernel( float* out, int size){
 
 }

Unless you're using prepared invocation, you have to use Numpy's sized
integers/floats:
http://documen.tician.de/pycuda/driver.html#pycuda.driver.Function.param_set

I don't think it's advisable to change that.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Patch for error C2143: syntax error : missing '; ' before 'type' on latest master for MSVC

2010-03-04 Thread Andreas Klöckner
On Donnerstag 04 März 2010, Ian Ozsvald wrote:
 Scratch the last - the same errors occur with the latest master as
 listed below. In my haste I didn't remove the already-compiled kernels
 (I cleared the wrong cache directory sigh).
 
 The fix is to comment out lines 312 and 457 of pycuda-complex.hpp and
 recompile, I still get a ton of warning messages (the same declspec
 ones as below) but I can run my examples.

Try now. (Believe it or not, the warning messages actually helped. It
appears that the __device__ must appear before the return type to be
meaningful... maybe...)

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] More Patches, and cuda.init() elimination

2010-03-04 Thread Andreas Klöckner
On Donnerstag 04 März 2010, Imran Haque wrote:
 Fabrizio Milo aka misto wrote:
  does anyone has an example of a program where doesn't use the
  cuda.autoimport before using any of the pycuda.* ?
 
 Yes, my library (shameless plug: https://simtk.org/home/siml)

Cool--I've added that to 
http://wiki.tiker.net/PyCuda/ShowCase

Hope you don't mind.

Also: Everyone, if you have an application of PyCUDA that the world
should know about, please click that link up there and hit edit! :)


 manually
 handles CUDA initialization, because just getting some context from
 autoinit is not sufficient - I want to be able to select which device I
 get a context on, and potentially have multiple contexts on multiple
 devices (the library supports context-switching to simultaneously use
 multiple GPUs from a single thread).

Same here--my PDE solver also needs to handle init itself. For example,
forking can become strangely dangerous after cuInit(). Therefore my code
needs to control carefully when CUDA is initialized.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] More Patches, and cuda.init() elimination

2010-03-04 Thread Andreas Klöckner
On Donnerstag 04 März 2010, Fabrizio Milo aka misto wrote:
 The work around should simple be to import pycuda after the fork.
 Importing before would be useless, because for sure you can't
 initialize cuInit and thus can't use any cu* function..
 
 Or I am missing something ?

Imports might happen at module scope, not just where PyCUDA is actively
used. Yes, it can all be worked around.  But no, the behavior is not
going to change.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Attempts of patches

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 Wouldn't benefit performance wise?

No.

 What about creating a Device that
 is just a Proxy
 for the real _driver.Device
 
 class Device(object):
 
  def __init__(self,flags):
  _driver.init()
 self._device = _driver.Device(flags)
 

That would be slower *and* cumbersome.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] OpenGl interop example

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 Errata corrige:
 
 Seems it can be simply None, but not 0
 
 glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA,
  w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, None)
 
 Fabrizio

The wiki is the 'official' version of the examples, so you are able to
fix this yourself. Please go ahead!

Thanks,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Attempts of patches

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 I found the real problem in mac for opengl.
 Patch in attachment
 You can remove the previous setup.py logic

Done, thanks.

 I think the design will benefit a lot from having a Device or Context class
 that manages all the resources on the device + alignments and other
 tricks device-related.

Why?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Patch for error C2143: syntax error : missing '; ' before 'type' on latest master for MSVC

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Ian Ozsvald wrote:
 This error is described here:
 http://andre.stechert.org/urwhatu/2006/01/error_c2143_syn.html
 MSVC doesn't like C99-style variable declarations in the middle of the
 function and wants C89 declarations at the start of the function (or
 so the author states).
 

Thx, applied.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Patch for error C2143: syntax error : missing '; ' before 'type' on latest m aster for MSVC

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Ian Ozsvald wrote:
 lude/pycuda\pycuda-complex.hpp(299): error: c
 alling a __device__ function from a __host__ function is not allowed

I've added a few more fixes to git master. Can you please try it and
report back? If it doesn't work, please post the entire error message.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] weird bug with exp

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Dan Goodman wrote:
 Could it be a 32/64 bit issue? I have a 64 bit Win7 machine, but my
 Python, numpy, etc. are 32 bit and so I had to compile PyCUDA using 32
 bits (but the NVIDIA driver is 64 bit). Probably this shouldn't work at
 all, but it seems to work fine for everything except exp.

Not sure what could be wrong. For the record, your code works on my
machine (260, 64-bit Debian Linux)

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Other Small patches

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 I think would be nice to alert the user if they are trying to pass a
 numpy.array directly to the kernel.

Regarding the 'yet' in the error message: This works when using
In/Out/InOut, but direct passing will otherwise never be supported.

Further, I'd prefer if the test ended up in the existing error path to
make sure it's performance-neutral.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] test_gpuarray.py is failing

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 Hi folks
 
 Test gpu_array is failing on my Macos, in attachment a small patch
 that fixes a bug in one of the tests and my gzipped-output running

Patch: applied, thanks.

I can't reproduce your issue, though--this works for me. What GPU,
driver, compiler?

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] More Patches, and cuda.init() elimination

2010-03-03 Thread Andreas Klöckner
On Mittwoch 03 März 2010, Fabrizio Milo aka misto wrote:
 In attachment more patches.
 
 The big one is the 006, which eliminates the need of calling
 esplicitly cuda.init().
 The cuInit functions gets called upon _driver import in the
 init_driver Python-Module function.

a) This would break a public interface and would need to happen with
deprecation, warnings, etc.
b) Imports with side effects are bad. (pycuda.autoinit is the only
module with side-effects in PyCUDA--and the side effect is its only
purpose.)

 I am still crunching the cuda_context stack and its implementation. I
 am not 100% sure of the 0002 patch indeed, but looks smelly that code.
 I think there is a way to eliminate completely the need of the
 autoinit, still investigating in it.

1: applied
2: applied, that's indeed a bug--surprisingly, that line was a function
declaration.
3: already in 
4: applied
5: me no like
6: see above

Thanks for your work,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Installing on Windows XP 64 bit/Microsoft Visual Studio 2008

2010-03-02 Thread Andreas Klöckner
On Dienstag 02 März 2010, reckoner wrote:
 I ran test_driver.py  and it looked like it was working okay, until it 
 caused my screen to pixelate so much that I couldn't read it.
 
 Thanks in advance.

This shouldn't happen--or rather, the driver should prevent this from
happening. AFAIK, GPUs have some memory protection, so CUDA code can't
really overwrite display state unless something's seriously wedged. I
agree with Ian that you might be having driver or thermal issues. What
version of the driver are you using? Can you try and upgrade to the
latest-and-greatest?

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] FFT for PyCuda

2010-03-02 Thread Andreas Klöckner
On Dienstag 02 März 2010, Bogdan Opanchuk wrote:
  If you'd like pycudafft to be part of PyCUDA itself, we can
  discuss how that could happen.
 
 I am not sure it is necessary. There is *nix ideology, which favors
 separated functionality. And you will have to add mako templating
 engine as a dependency for pycuda. But the final decision about
 architecture of your package is on you of course; it is not a problem
 for me to compose corresponding patch. I already changed plan
 interface a little in order to make it use shape/dtype parameters in a
 same way as numpy arrays and pycuda.gpuarrays.

It's a double-edged sword, IMO. The simple-small-modular approach has
obvious maintainability advantages. On the other hand, an integrated
package is more convenient to install and depend on. I'll leave this up
to you to decide.

For now, I've added a link to pycudafft to the docs:
http://documen.tician.de/pycuda/array.html#fast-fourier-transforms

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] possible bug ?

2010-03-01 Thread Andreas Klöckner
On Montag 01 März 2010, Fabrizio Milo aka misto wrote:
 I get strange errors on my macbook pro with the 3.0 Cuda
 I think there is an error invoking get_version should be get_version()
 
 diff --git a/pycuda/compiler.py b/pycuda/compiler.py
 index 140a098..0c13cf2 100644
 --- a/pycuda/compiler.py
 +++ b/pycuda/compiler.py
 @@ -195,7 +195,8 @@ class SourceModule(object):
  arch, code, cache_dir, include_dirs)
 
  from pycuda.driver import get_version
 -if get_version  (2,2,0):
 +
 +if get_version()  (2,2,0):
  # FIXME This is wrong--these are per-function attributes.
  # Remove this in 0.94.
  def failsafe_extract(key, cubin):

Good catch. What kind of error are you getting?

My favorite way of fixing this would be releasing 0.94 soon because it
simply eliminates this code. If you need this sooner and/or on the 0.93
branch, please let me know.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] questions on example

2010-03-01 Thread Andreas Klöckner
On Samstag 27 Februar 2010, Xueyu Zhu wrote:
  11   const int i = threadIdx.x;

I'd suggest you check this line here. :)

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Installing on Windows XP 64 bit/Microsoft Visual Studio 2008

2010-03-01 Thread Andreas Klöckner
On Montag 01 März 2010, reckoner wrote:
 The problem I'm having with the above mentioned Using Visual Studio 
 2008 (alternative on January 2010) instructions is that I cannot get 
 the examples in pycuda to work. It seems to fail at the stage of linking 
 the nvcc-compiled code and I'm not sure why this happens.

Can you post your error message?

Thanks,
Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] possible bug ?

2010-03-01 Thread Andreas Klöckner
On Montag 01 März 2010, Fabrizio Milo aka misto wrote:
 pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid value

Weird. I can't reproduce this on Linux. Anyone on Mac?

 Btw what is the best way to send you patches?

Use a git checkout, commit your changes, then use 'git format-patch'.

Andreas

PS: Please make sure the list stays cc'd. Thanks.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Garbage after copying to and from shared memory

2010-02-28 Thread Andreas Klöckner
On Dienstag 09 Februar 2010, Bogdan Opanchuk wrote:
 Hello,
 
 Yet another stupid question. Most probably, I missed something
 obvious, but anyway - can someone explain why I get some NaN's in
 output for the program (listed below)? Surprisingly, bug disappears if
 I send '1' instead of '-1' as a third parameter to function (or remove
 'int' parameters completely and leave only two pointers). Same kernel
 in pure Cuda works fine. Looks like memory corruption, but I can't
 figure out where it happens...

This looks like a compiler bug to me. I've attached the PTX that the 3.0
compiler generates--apparently all your loops get unrolled, and then
something gets confused, though I wasn't able to track down what
exactly.

Couple more data points:
- Even in the first case (that you report as being ok), I get floating
  point garbage in the first 32 entries of b_gpu.
- Adding an index bounds check to the second for loop also appears to
  fix things.

Have you reported this to Nvidia? (If not, you should.)

Andreas

PS: Sorry for the long absence everybody. I was at a workshop, and then
had lots to do on my return. Plus I have a thesis coming up, so please
bear with me. :)

.version 1.4
.target sm_13
// compiled with /home/andreas/pool/cuda-3.0/open64/lib//be
// nvopencc 3.0 built on 2009-10-26

//---
// Compiling kernel.cpp3.i (/tmp/ccBI#.vZIo77)
//---

//---
// Options:
//---
//  Target:ptx, ISA:sm_13, Endian:little, Pointer Size:64
//  -O3 (Optimization level)
//  -g0 (Debug level)
//  -m2 (Report advisories)
//---

.file   1   command-line
.file   2   kernel.cudafe2.gpu
.file   3   /usr/lib/gcc/x86_64-linux-gnu/4.4.3/include/stddef.h
.file   4   
/home/andreas/pool/cuda/bin/../include/crt/device_runtime.h
.file   5   /home/andreas/pool/cuda/bin/../include/host_defines.h
.file   6   /home/andreas/pool/cuda/bin/../include/builtin_types.h
.file   7   /home/andreas/pool/cuda/bin/../include/device_types.h
.file   8   /home/andreas/pool/cuda/bin/../include/driver_types.h
.file   9   /home/andreas/pool/cuda/bin/../include/surface_types.h
.file   10  /home/andreas/pool/cuda/bin/../include/texture_types.h
.file   11  /home/andreas/pool/cuda/bin/../include/vector_types.h
.file   12  
/home/andreas/pool/cuda/bin/../include/device_launch_parameters.h
.file   13  
/home/andreas/pool/cuda/bin/../include/crt/storage_class.h
.file   14  /usr/include/bits/types.h
.file   15  /usr/include/time.h
.file   16  
/home/andreas/pool/cuda/bin/../include/texture_fetch_functions.h
.file   17  
/home/andreas/pool/cuda/bin/../include/common_functions.h
.file   18  
/home/andreas/pool/cuda/bin/../include/crt/func_macro.h
.file   19  
/home/andreas/pool/cuda/bin/../include/math_functions.h
.file   20  
/home/andreas/pool/cuda/bin/../include/device_functions.h
.file   21  
/home/andreas/pool/cuda/bin/../include/math_constants.h
.file   22  
/home/andreas/pool/cuda/bin/../include/sm_11_atomic_functions.h
.file   23  
/home/andreas/pool/cuda/bin/../include/sm_12_atomic_functions.h
.file   24  
/home/andreas/pool/cuda/bin/../include/sm_13_double_functions.h
.file   25  
/home/andreas/pool/cuda/bin/../include/sm_20_atomic_functions.h
.file   26  
/home/andreas/pool/cuda/bin/../include/sm_20_intrinsics.h
.file   27  
/home/andreas/pool/cuda/bin/../include/surface_functions.h
.file   28  
/home/andreas/pool/cuda/bin/../include/math_functions_dbl_ptx3.h
.file   29  kernel.cu


.entry test (
.param .u64 __cudaparm_test_in,
.param .u64 __cudaparm_test_out,
.param .s32 __cudaparm_test_dir,
.param .s32 __cudaparm_test_S)
{
.reg .u32 %r17;
.reg .u64 %rd8;
.reg .f32 %f81;
.shared .align 4 .b8 __cuda_sMem24[8192];
.loc29  3   0
$LBB1_test:
.loc29  8   0
mov.f32 %f1, 0f;// 0
mov.f32 %f2, %f1;
mov.f32 %f3, 0f;// 0
mov.f32 %f4, %f3;
mov.f32 %f5, 0f;// 0
mov.f32 %f6, %f5;
mov.f32 %f7, 0f;// 0
mov.f32 %f8, %f7;
mov.f32 %f9, 0f;// 0
mov.f32 

Re: [PyCUDA] Incorrect shared memory size for kernel

2010-02-07 Thread Andreas Klöckner
On Sonntag 07 Februar 2010, Bogdan Opanchuk wrote:
.entry test (
.param .u32 __cudaparm_test_out)
{
.reg .u32 %r3;
.reg .f32 %f4;
.loc15  192 0
 $LBB1_test:
.loc15  198 0
ld.param.u32%r1, [__cudaparm_test_out];
mov.f32 %f1, 0f3f80;// 1
st.global.f32   [%r1+0], %f1;
.loc15  199 0
mov.f32 %f2, 0f4000;// 2
st.global.f32   [%r1+4], %f2;
.loc15  200 0
exit;
 $LDWend_test:
} // test
 
 But when I'm trying to compile this kernel with PyCuda, for some
 reason this function has attribute shared_size_bytes==20. Can anyone
 please explain why is the size of shared memory non-zero? I am
 completely at a loss here.

I think kernel parameters (and apparently a bunch of other stuff) map to
shared memory.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Installing PyCUDA windows vista x32

2010-02-07 Thread Andreas Klöckner
On Sonntag 07 Februar 2010, Marco André Argenta wrote:
 C:\Python26\lib\distutils\dist.py:266: UserWarning: Unknown
 distribution option: 'install_requires'
   warnings.warn(msg)

I can't say much about the error message, but the warning above makes me
suspect that something relating to the setuptools-vs-distribute disaster
may be at work here.

See http://wiki.tiker.net/DistributeVsSetuptools, and let me (and the
list) know if it helped.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Sharing is caring

2010-01-31 Thread Andreas Klöckner
On Sonntag 31 Januar 2010, Per B. Sederberg wrote:
 Perhaps, similar to the showcase on the wiki, we could add an examples
 page:
 
 http://wiki.tiker.net/PyCuda/ShowCase
 
 Andreas, what do you think?

Good idea. See

http://wiki.tiker.net/PyCuda/Examples

The examples/ subdirectory now has a link to that web page and only
contains the most basic examples to get people started.

In addition, the examples directory now also contains a script that
automatically downloads all the Wiki examples to keep them as easily
runnable as before.

Thanks for the suggestion!

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] PyCUDA academic citation

2010-01-31 Thread Andreas Klöckner
Hi Imran,

On Samstag 30 Januar 2010, Imran Haque wrote:
 Is there a particular paper or conference presentation that you'd like
 cited for PyCUDA in academic papers? It's the least we can do for your
 efforts!

http://arxiv.org/abs/0911.3456

We've also submitted this to Parallel Computing (Elsevier), but
haven't heard back yet.

Thanks for asking--much appreciated!

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Complex number support?

2010-01-27 Thread Andreas Klöckner
On Mittwoch 27 Januar 2010, Ian Ozsvald wrote:
 Hi Andreas/Ying Wai, I see a discussion you've had about complex number
 support:
 http://www.mail-archive.com/pycuda@tiker.net/msg00788.html
 
 I also see the 'complex' tag:
 http://git.tiker.net/pycuda.git/commit/296810d8c57f7620cfcc959f73f6aefbb021
 5133 and I've merged the code with mine.

(It's a branch, actually.) The right way to get it is like so:

If you already have a complex branch (see 'git branch')
$ git checkout complex # switches you to your complex branch
$ git pull http://git.tiker.net/pycuda.git/ complex # fetch+merge changes

If you don't already have a complex branch
$ git fetch http://git.tiker.net/pycuda.git/ complex:complex

I'd advise against messing with raw commit SHAs.

 When I try to run demo_complex.py I get an error (below) - should the demo
 work without an error? 

Works for me, prints some small number. I just updated the complex
branch to current master--pull and try again.

 Given the discussion you were both having I'm not
 clear whether the complex support is finished or not?

It's not quite finished, the current hangup is to figure out how complex
scalars are to be passed to kernels. My current preference is to rely on
numpy's buffer interface, like so:

 import numpy
 x = numpy.complex64(123+456j)
 str(buffer(x))
'\x00\x00\xf6B\x00\x00\xe4C'

Let me know how things go.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Windows runtime error ImportError: DLL load failed: The specified module could not be found.

2010-01-26 Thread Andreas Klöckner
On Dienstag 26 Januar 2010, Ian Ozsvald wrote:
 All done:
 http://wiki.tiker.net/PyCuda/Installation/Windows#Using_Visual_Studio_2008_
 .28alternative_on_January_2010.29

Cool. Thanks very much.

 i.
 ps. Andreas I still get the REPLY field configured as Andreas Klöckner 
 li...@informa.tiker.net when I hit reply to any of your messages rather
 than pycuda@tiker.net

That's fine--the list is configured as reply-to-poster rather than
reply-to-list. As long as my address doesn't show as something involving
monster, everything is working as designed.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Get PyCuda 0.93 working on Snow Leopard

2010-01-23 Thread Andreas Klöckner
Hi Krunal,

first of all, welcome, and thanks for writing up your experience. I'm
sorry that your install was as troublesome as it seems to have been.

To make things better for everyone, I would like to ask two favors of you:

1) If there is anything that we can do by default in PyCUDA to make a
default Mac install less painful, please let me know. For example,
should -m32 be the default for Mac installs?

2) Next, while many people might eventually find this information on the
mailing list, the Wiki (that you refer to) is designed to be the main
place for up-to-date installation information. It's a wiki for a
reason--to be edited! Don't be shy, and change that information where
it's wrong, or augment it where additional info is needed.

If I can be of any assistance, please let me know.

Thanks again,
Andreas

On Samstag 23 Januar 2010, Krunal Patel wrote:
 Hi there,
 my name is Krunal and I'm a new user to PyCUDA.  I'm helping Nicolas out on
  his research.  My first order of business was to get PyCUDA installed on
  my iMac Snow Leopard.  Here were my experiences:
 
 1) stick with the version of python installed by Mac, don't upgrade to the
  latest.  The default version is 2.6.1.  The main reason for this is
  because I couldn't figure out how to run a newly upgraded version of
  python in 32-bit mode. 2) I tried 1.38-1.41 versions of Boost.  At the
  end, the one that worked for me was 1.39.
 
 Here is how I got things working:
 1) first follow: http://wiki.tiker.net/BoostInstallationHowto
 but do this:
 ./bootstrap.sh --prefix=$HOME/pool --libdir=$HOME/pool/lib
  --with-libraries=signals,thread,python ./bjam address-model=32_64
  architecture=x86 variant=release link=shared install (the architecture
  stuff comes from
  http://old.nabble.com/Boost-on-Snow-Leopard-failing-to-build-32-bit-target
 -td25218466.html and is the key to not getting the architecture is invalid
  when running test_driver.py)
 
 2) then follow: http://wiki.tiker.net/PyCuda/Installation/Mac
 but do this:
 python configure.py --boost-inc-dir=/Users/k5patel/pool/include/boost-1_39/
  --boost-lib-dir=/Users/k5patel/pool/lib/
  --boost-python-libname=boost_python --cuda-root=/usr/local/cuda/
 
 siteconf.py shoud look like:
 BOOST_INC_DIR = ['/Users/k5patel/pool/include/boost-1_39/']
 BOOST_LIB_DIR = ['/Users/k5patel/pool/lib/']
 BOOST_COMPILER = 'gcc42'
 BOOST_PYTHON_LIBNAME = ['boost_python-xgcc42-mt']
 BOOST_THREAD_LIBNAME = ['boost_thread-xgcc42-mt']
 CUDA_TRACE = False
 CUDA_ROOT = '/usr/local/cuda/'
 CUDA_ENABLE_GL = False
 CUDADRV_LIB_DIR = []
 CUDADRV_LIBNAME = ['cuda']
 CXXFLAGS = []
 LDFLAGS = []
 
 change setup.py so it has this:
 if 'darwin' in sys.platform:
 # prevent from building ppc since cuda on OS X is not compiled for
  ppc if -arch not in conf[CXXFLAGS]:
 conf[CXXFLAGS].extend(['-arch', 'i386','-m32'])
 if -arch not in conf[LDFLAGS]:
 conf[LDFLAGS].extend(['-arch', 'i386','-m32'])
 
 the key is -m32 as indicated by
  http://article.gmane.org/gmane.comp.python.cuda/1089
 
 3) do sudo make install.  ensure that there are no warnings around the line
  of different architecture 4) once built, go to the test directory and
  type in:
 python test_driver.py
 hopefully it passes
 
 don't type:
 sudo python test_driver.py as indicated elsewhere
 
 I hope this could be of help to those who couldn't get PyCUDA 0.93 working
  on their system.  The main problem has to do with SL's 64-bitness and
  CUDA's lack of it (and hence PyCUDA)
 
 krunal


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Get PyCuda 0.93 working on Snow Leopard

2010-01-23 Thread Andreas Klöckner
On Samstag 23 Januar 2010, Krunal Patel wrote:
 Yes I think the default should be -m32.

Done in git. Thanks for your advice.

 I have done the needful on the wiki pages.

Thank you very much for your work!

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Get PyCuda 0.93 working on Snow Leopard

2010-01-23 Thread Andreas Klöckner
On Samstag 23 Januar 2010, Andreas Klöckner wrote:
  I have done the needful on the wiki pages.
 
 Thank you very much for your work!

I've hacked the Wiki a little bit--can you please take a quick look?

Thanks!
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] How do I make a multichannel 1D texture?

2010-01-15 Thread Andreas Klöckner
On Dienstag 12 Januar 2010, Dan Piponi wrote:
 I'm having trouble figuring out how to make a 4 channel 1D texture for
 use with tex1D.
 
 I can easily make a 2D 4 channel texture, from an MxNx4 numpy 3D
 array, using make_multichannel_2d_array and bind_array_to_texref. The
 third axis of the array has size 4 and becomes the 4 channels in a
 float4 texture. Works fine with tex2D.
 
 But I can't find a sequence of calls that takes an Nx4 numpy 2D array,
 with size 4 in the second axis, and turns it into a 4 channel 1D
 texture suitable for use with tex1D. I can't find something
 corresponding to make_multichannel_1d_array.
 
 So starting with an Nx4 2D numpy array, what sequence of calls do I
 need to make to get 1D texture with float4 elements?

Do you really want tex1D and not tex1Dfetch?

If the latter, http://is.gd/6kv0P

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Compile problems for pycuda on Karmic 64

2010-01-15 Thread Andreas Klöckner
On Freitag 15 Januar 2010, John Zbesko wrote:
 KeyError: '_driver'

http://is.gd/6kvv9

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] PyCUDA on Snow Leopard

2010-01-11 Thread Andreas Klöckner
Hi Bryan, all,

On Sonntag 10 Januar 2010, Bryan Catanzaro wrote:
 I also had this problem.  Python on Snow Leopard defaults to a 64-bit
  executable.  You can check this by typing: import sys
 print sys.maxint
 
 If it's ~2 billion, you're running Python in 32-bit mode.  If it's a huge
  number, you've got a 64-bit Python, which conflicts with your 32-bit
  CUDA/PyCUDA drivers and can cause your error message.
 
 You can force Python on Snow Leopard to run in 32-bit by setting a special
  environment variable: VERSIONER_PYTHON_PREFER_32_BIT=yes
 
 Or you can make this change permanent and global by writing this to the
  defaults: defaults write com.apple.versioner.python Prefer-32-Bit 1

thanks for the information.  I've added this to the Wiki. In the
interest of knowledge retention, I'd like to request that if you run
into unforeseen trouble, please write about it on the wiki. In
particular, please include the error message so the issue becomes
googlable. You do not need a user name to edit the Wiki.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Windows runtime error ImportError: DLL load failed: The specified module could not be found.

2010-01-08 Thread Andreas Klöckner
On Freitag 08 Januar 2010, Ian Ozsvald wrote:
 Can anyone suggest any reasons why boost is looking for python25.dll rather
 than the 2.6 equivalent?

Check boost's project-config.jam.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] FFT code gets error only when Python exit

2010-01-07 Thread Andreas Klöckner
On Donnerstag 07 Januar 2010, Ying Wai (Daniel) Fan wrote:
 I changed cuComplex_mod.h a bit to force the use of complex.h. Looks
 like the route of using GNU C library does not work. Complex arithmetic
 operations are regarded as host functions by CUDA and host functions
 cannot be called from device. I got the following errors:

So I guess shipping the hacked STLport header is our only option then,
right? (Which is not bad--it's decent code.)

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] RuntimeError: could not find path to PyCUDA's C header files on Mac OS X Leopard, CUDA 2.3, pyCuda 0.94beta, python 2.5

2010-01-07 Thread Andreas Klöckner
On Donnerstag 07 Januar 2010, Ian Ozsvald wrote:
 I'm having a devil of a time getting pyCUDA to work on my MacBook and I
 can't get past this error:
 host47:examples ian$ python demo.py
 Traceback (most recent call last):
   File demo.py, line 22, in module
 )
   File /Library/Python/2.5/site-packages/pycuda/compiler.py, line 203, in
 __init__
 arch, code, cache_dir, include_dirs)
   File /Library/Python/2.5/site-packages/pycuda/compiler.py, line 188, in
 compile
 include_dirs = include_dirs + [_find_pycuda_include_path()]
   File /Library/Python/2.5/site-packages/pycuda/compiler.py, line 149, in
 _find_pycuda_include_path
 raise RuntimeError(could not find path to PyCUDA's C header files)
 RuntimeError: could not find path to PyCUDA's C header files
 
 Below I have version info and the full build process leading up to the
 error...Any pointers would be *hugely* appreciated.  If someone could
 explain what's happening here and what pyCUDA is looking for, that might
 point me in the right direction.
 
 Am I missing something silly?  I spent yesterday banging my head against
  the 'make' process until I found a spurious '=' in the ./configuration.py
  arguments (entirely my fault), maybe I've missed something silly here too?

See if you can find pycuda-helpers.hpp under /Library/Python/2.5/site-
packages/ 
somewhere, we may need to adapt _find_pycuda_include_path(). It's quite
interesting to see where all this stuff can end up...

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] 0.94 woes

2010-01-07 Thread Andreas Klöckner
On Donnerstag 07 Januar 2010, Nicholas S-A wrote:
 Well, if somebody writes code that uses arch=sm_13 for some reason,
 then somebody who doesn't have a 200 series card tries to use it, then
 the error message which comes up is pretty cryptic. Just trying to make
 it more understandable. It is an error that they should not have made in
 the first place -- so perhaps it should be a LogicError?

Downgraded to a warning, merged.

Thanks for the patch,
Andreas

PS: Please do keep the list cc'd.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] My modification to PyCUDA

2010-01-07 Thread Andreas Klöckner
On Mittwoch 30 Dezember 2009, Ying Wai (Daniel) Fan wrote:
 Andreas,
 
 I have done some changes to make arithmetic operation works with complex
 GPUArray objects. The patch is attached.

I don't quite agree with your treatment of the complex scalars.
Couple possibilities:

1) We ship a fixed version of the struct module that can serialize
complex numbers.

2) struct can be fooled into serializing 2f and get the real and imag
components separately.

3) We serialize numpy complex byte-for-byte via
buffer(numpy.complex(3j)), circumventing the need for
struct to understand complex.

What do you think?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] FFT code gets error only when Python exit

2010-01-06 Thread Andreas Klöckner
n Mittwoch 06 Januar 2010, Ying Wai (Daniel) Fan wrote:
  Now in your situation there's a failure when reactivating the context to
  detach from it, probably because the runtime is meddling about. The only
  reason why cuCtxPushCurrent would throw an invalid value, is, IMO, if
  that context is already somewhere in the context stack. So it's likely
  that the runtime reactivated the context. In current git, a failure of
  the PushCurrent call should not cause a failure any more--it will print a
  warning instead.
 
 I believe different contexts can't share variables that is on the GPUs.

True.

 I can use GPUArray objects as arguments to my fft functions, and these
 objects still exist after fft. So I think fft is using the same context
 as pycuda.

Right. I take it that if runtime functions execute when a driver context
exists, they'll reuse that context.

 I made the change indicated in the attached diff file, such that
 context.synchronize() and context.detach() would print out the context
 stack size, and detach() would also print out whether current context is
 active. With this I verify that the stack size is 1 before and after
 running fft code and the context does not change.

I should clarify here. CUDA operates one context stack, and PyCUDA has
another one. CUDA's isn't sufficient because it will not let the same
context be activated twice. PyCUDA on the other hand needs exactly this
functionality, to ensure that cleanup can happen whenever it is needed.
Hence PyCUDA maintains its own context stack, and keeps CUDA's stack
at most one deep. You are looking at the PyCUDA stack.

 My guess is that CUFFT make some change to the current context, such
 that once this context is poped, it is automatically destroyed.

I disagree. I think CUDA somehow lets us pop the context, does not
report an error, but also does not actually pop the context (since the
runtime is still talking to it). Then, when PyCUDA tries to push the
context back onto CUDA's stack to detach from it, that fails. I've filed
a bug report with Nvidia, we'll see what they say.

 If my
 guess is correct, then calling context.detach() would destroy the
 context, since its usage count drops to 0, and it could circumvent the
 warning message when the context destructor is called.

Pop only removes the context from the context stack (and hence
deactivates it), but retains a reference to it. It should not cause
anything to be destoyed.

 I don't want
 people using my package to see warning message when Python exit, so I am
 not using autoinit in my package, but to create a context explicitly.

Sorry for this mess--I hope we can sort it out somehow.

 The following is kind of unrelated. I have done some experiments with
 contexts. I think context.pop() always pops up the top context from the
 stack, disregarding whether context is really at the top of the stack.
 E.g. I create two contexts c1 and c2 and then I can do c1.pop() twice
 without getting error.

This points to a doc and behavior bug in PyCUDA. Context.pop() should
have been static. It effectively was, but not quite. Fixed in git.

 cuComplex.h exists since CUDA 2.1 and it hasn't changed in subsequent
 version. cuComplex.h is used by cufft.h and cublas.h. I can't find any
 documentation to it. A quick search on google shows that JCuda seems to
 be using it.
 http://www.jcuda.org/jcuda/jcublas/doc/jcuda/jcublas/cuComplex.html

Hmm. A quick poke comes up with an error message:

8 --
kernel.cu(7): error: no operator * matches these operands
operand types are: cuComplex * cuComplex
8 --

Code attached. This might not be what we're looking for.

 Maybe we can simply use complex.h from GNU C library. A quick seach on
 my Ubuntu machine locates the following files:
 /usr/include/complex.h, which includes
 /usr/include/c++/4.4/complex.h, which then includes
 /usr/include/c++/4.4/ccomplex, which in turn includes
 /usr/include/c++/4.4/complex, which includes overloading of operators
 for complex number.

Two words: Windows portability. :) Aside from that, this is unlikely to
work, as the system-wide complex header depends on I/O being available
and all kinds of other system-dependent funniness.

 Good luck on your PhD.

Likewise!

Andreas
import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
import numpy
import numpy.linalg as la
from pycuda.compiler import SourceModule

mod = SourceModule(
#include cuComplex.h
__global__ void multiply_them(cuComplex *dest, cuComplex *a, cuComplex *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
)

multiply_them = mod.get_function(multiply_them)

a = numpy.random.randn(400).astype(numpy.complex64)
b = numpy.random.randn(400).astype(numpy.complex64)

dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1))

print dest-a*b


signature.asc
Description: This 

Re: [PyCUDA] pycuda.driver.Context.synchronize() delay time is a function of the count and kind of sram accesses?

2010-01-02 Thread Andreas Klöckner
Couple points:

* bytewise smem write may be slow?
* sync before and after timed operation, otherwise you time who knows what
* or, even better, use events.

HTH,
Andreas


On Samstag 02 Januar 2010, Hampton G. Miller wrote:
 I have noticed something which seems odd and which I hope you will look at
 and then tell me if it is something unique to PyCUDA or else is something
 which should be brought to the attention of Nvidia.  (Or, that I am just a
 simpleton!)
 
 Looking at my test results, below, and referring to my attached Python
 program with comments, it seems to me that the amount of time taken by
 pycuda.driver.Context.
 synchronize() is strongly a function of the count and type of sram
 accesses.  This seems odd to me.  Do you agree?
 
 For example, it takes over 13 seconds to sync after doing nothing more than
 writing zeros to (almost) all of the sram bytes for a 512x512 grid!
 
 Regards, Hampton
 
 
 PyCUDA 0.93 running on Mint 7 Linux
 
 Using device GeForce 9800 GT
gridDim_x   gridDim_y  blockDim_x  blockDim_y  blockDim_z
 A B C D E F G H
  0:1   1   1   1   1
 0.001050  0.000120  0.000442  0.72  0.000257  0.69  0.68
 0.69
  1:1   1 512   1   1
 0.000828  0.72  0.000441  0.73  0.000257  0.70  0.69
 0.69
  2:1 100 512   1   1
 0.007309  0.000167  0.003026  0.000106  0.001546  0.72  0.72
 0.72
  3:  100   1 512   1   1
 0.005985  0.77  0.003016  0.71  0.001543  0.73  0.72
 0.71
  4:  100 100 512   1   1
 0.526857  0.000303  0.263423  0.000302  0.131828  0.000304  0.000311
 0.000210
  5:1 256 512   1   1
 0.014104  0.000167  0.007073  0.75  0.003572  0.76  0.76
 0.73
  6:  256   1 512   1   1
 0.014087  0.81  0.007069  0.77  0.003570  0.93  0.77
 0.73
  7:  256 256 512   1   1
 3.447902  0.001038  1.724391  0.001039  0.862664  0.001041  0.001586
 0.000957
  8:1 512 512   1   1
 0.027301  0.61  0.013667  0.46  0.006857  0.45  0.50
 0.44
  9:  512   1 512   1   1
 0.027314  0.000125  0.013669  0.47  0.006855  0.45  0.49
 0.44
 10:  512 512 512   1   1
 13.789054  0.003796  6.896283  0.003800  3.449923  0.003794  0.006229
 0.003898
 31.298553 secs total
 
 #!/usr/bin/env python
 
 # nvidia_example.py -
 
 import sys
 import os
 import time
 import numpy
 
 import pycuda.autoinit
 import pycuda.driver as cuda
 from   pycuda.compiler import SourceModule
 
 
 gridDim_x  = 1
 gridDim_y  = 1
 
 blockDim_x = 1
 blockDim_y = 1
 blockDim_z = 1
 
 gridBlockList = [
 (1,  1,   1,1,1), (  1,1, 512,1,1),
 (1,100, 512,1,1), (100,1, 512,1,1), (100,100, 512,1,1),
 (1,256, 512,1,1), (256,1, 512,1,1), (256,256, 512,1,1),
 (1,512, 512,1,1), (512,1, 512,1,1), (512,512, 512,1,1) ]
 
 #
 ===
 ===
 
 cuda.init()
 device = pycuda.tools.get_default_device()
 print Using device, device.name()
 
 dev_dataRecords = cuda.mem_alloc( 1024 * 15 )
 
 #
 ---
 ---
 
 krnl = SourceModule(
 __global__ void worker_0 ( char * src )
 {
 __shared__ char dst[ (1024 * 15) ];
 int i;
 
 if ( threadIdx.x == 0 )
 {// Case A:
 for( i=0; isizeof(dst); ++i )// Count = sizeof(dst)
 dst[ i ] = 0;// Type = indexed by i
 
 dst[ 0 ] = dst[ 1 ];// (Gag set but never
  used warning message from compiler)
 };
 }
 
 __global__ void worker_1 ( char * src )
 {
 __shared__ char dst[ (1024 * 15) ];
 int i;
 
 if ( threadIdx.x == 0 )
 {// Case B:
 for( i=0; isizeof(dst); ++i )// Count = sizeof(dst)
 dst[ 0 ] = 0;// Type = always the same
 element, 0
 
 dst[ 0 ] = dst[ 1 ];
 };
 }
 
 __global__ void worker_2 ( char * src )
 {
 __shared__ char dst[ (1024 * 15) ];
 int i;
 
 if ( threadIdx.x == 0 )
 {// Case C:
 for( i=0; i(sizeof(dst)/2); ++i )// Count = sizeof(dst)/2
 dst[ i ] = 0;// Type = indexed by i
 
 dst[ 0 ] = dst[ 1 ];
 };
 }
 
 __global__ void worker_3 ( char * 

Re: [PyCUDA] PyCuda installation instructions for Gentoo Linux

2009-12-22 Thread Andreas Klöckner
On Dienstag 22 Dezember 2009, Justin Riley wrote:
 Hi All,
 
 I've added a page for installing PyCuda on Gentoo Linux to the PyCuda wiki.

Sweet, thanks!

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] CompileError

2009-12-21 Thread Andreas Klöckner
On Montag 21 Dezember 2009, Ewan Maxwell wrote:
 test_driver.py works(all 16 tests), the example code i mentioned works to
 however, other tests still fail with the same reason(nvcc fatal~)

That's strange. If all of them have the same path, why would some fail and 
some succeed?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] PyCUDA Digest, Vol 18, Issue 12

2009-12-21 Thread Andreas Klöckner
On Montag 21 Dezember 2009, oli...@olivernowak.com wrote:
 like i said.
 
 a week.

Can you comment on what made this difficult? Can we do anything to make this 
easier, e.g. by catching common errors and providing more helpful messages?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Passing struct into kernel

2009-12-19 Thread Andreas Klöckner
On Freitag 18 Dezember 2009, Dan Piponi wrote:
 go.prepare(s, ...)   # tell PyCuda we're passing in a string buffer
 go.prepared_call(grid, struct.pack(i,12345))# pack integer into
 string buffer

struct_typestr = i
sz = struct.calcsize(struct_typestr)
go.prepare(%ds % sz, ...)   # tell PyCuda we're passing in a string buffer
go.prepared_call(grid, struct.pack(struct_typestr, 12345))

codepy.cgen.GenerableStruct can help you generate type strings and C source 
code from a single source and will also help you make packed instances.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Passing struct into kernel

2009-12-18 Thread Andreas Klöckner
On Freitag 18 Dezember 2009, Dan Piponi wrote:
 I have no problem making a struct in global memory and passing in a
 pointer to it. But arguments to kernels get stored in faster memory
 than global memory don't they?

Right. You can pack a struct into a string using Python's struct module, and 
since string support the buffer protocol, they can be passed to PyCUDA 
routines.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] cuModuleGetFunction failed: not found

2009-12-14 Thread Andreas Klöckner
On Montag 14 Dezember 2009, Robert Cloud wrote:
 Traceback (most recent call last):
   File stdin, line 1, in module
 pycuda._driver.LogicError: cuModuleGetFunction failed: not found

The problem is that nvcc compiles code as C++ by default, which means it uses 
name mangling [1].

If you don't want to use PyCUDA's just-in-time compilation facilities [2], 
then just add an 'extern C' to your declarations.

Andreas

[1] http://en.wikipedia.org/wiki/Name_mangling
[2] http://documen.tician.de/pycuda/driver.html#module-pycuda.compiler


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Runtime problem: cuDeviceGetAttribute failed: not found

2009-12-08 Thread Andreas Klöckner
On Dienstag 08 Dezember 2009, you wrote:
 It's the first time I've installed the drivers, so I don't have multiple
 versions. Anyway how can I know the versions of the headers used in the
 compilation?

pycuda.driver.get_version()
pycuda.driver.get_driver_version()

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Question about gpuarray

2009-12-04 Thread Andreas Klöckner
On Freitag 04 Dezember 2009, Bryan Catanzaro wrote:
 Thanks for the explanation.  In that case, do you have objections to
  removing the assertion that if a GPUArray is created and given a
  preexisting buffer, that the new array must be a view of another array? 
  In my situation, I don't think this assertion is true:  I would like to
  transfer ownership of a gpu buffer (created outside of PyCUDA by some host
  code) to a particular GPUArray.  This means I instantiate a GPUArray with
  gpudata=the pointer created by the host code, but base should still be
  None, since this new GPUArray is not a view of any other array, and so
  this GPUArray should have sole ownership of the buffer being given at
  initialization.

If I understand you correctly, then whatever you assign to .gpudata already 
establishes the lifetime dependency, right? In that case, yes, the assert 
should go away.

Andreas

PS: Please keep the PyCUDA list cc'ed, unless there's good reason not to.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] bad argument to internal function

2009-11-25 Thread Andreas Klöckner
On Mittwoch 25 November 2009, Ken Seehart wrote:
 Something like this came up for someone else in May:
 http://www.mail-archive.com/pycuda@tiker.net/msg00361.html
 
 *SystemError: ../Objects/longobject.c:336: bad argument to internal
 function*
   buf = struct.pack(format, *arg_data)
 
 I fixed it by hacking *_add_functionality* in *driver.py*.

As before when Bryan reported the bug, I can't seem to reproduce it. 
(Likewise, I can't reproduce the issue described in the corresponding numpy 
ticket [1].)

[1] http://projects.scipy.org/numpy/ticket/1110

What architecture are you running on? What version of Python? What version of 
numpy? (Can't reproduce on x86_64+2.5.4+1.3.0 and x86_64+2.6.1+{1.2.1 and 
1,3.0}.)

In principle, I'm not opposed to merging this fix, but I'd like some more 
information first.

Bryan: are you still encountering this? Any further information?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] CUDA 3.0 64-bit host code

2009-11-23 Thread Andreas Klöckner
Hey Bryan,

On Montag 23 November 2009, Bryan Catanzaro wrote:
 I built 64-bit versions of Boost and PyCUDA on Mac OS X Snow Leopard, as
  well as the 64-bit Python interpreter supplied by Apple, as well as the
  CUDA 3.0 beta.  Everything built fine, but when I ran pycuda.autoinit, I
  got an interesting CUDA error, which PyCUDA reported as pointer is
  64-bit. I'm wondering - is it impossible to use a 64-bit host program
  with a 32-bit GPU program under CUDA 3.0?

First, I'm not sure I fully understand what's going on. You can indeed compile 
GPU code to match a 32-bit ABI on a 64-bit machine (nvcc --machine 32 ...). Is 
that what you're doing? If so, why? (Normally, nvcc will default to your 
host's ABI. By and large, this changes struct alignment rules and pointer 
widths.)

If you're not doing anything special to get 32-bit GPU code, then your GPU 
code should end up matching your host ABI. Or maybe nvcc draws the wrong 
conclusions or is a fat binary or something and we need to actually specify 
the --machine flag.

I also remember wondering what the error message referred to when I added it. 
I'm totally not sure. Which routine throws it?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Install issue with Ubuntu 9.10

2009-11-22 Thread Andreas Klöckner
On Sonntag 22 November 2009, you wrote:
 Now this error:
 
 python test_driver.py
 Traceback (most recent call last):
   File test_driver.py, line 481, in module
 from py.test.cmdline import main
 ImportError: No module named cmdline

Again, py.test should have been installed automatically for you.

easy_install -U py

should do that for you. (Also see http://is.gd/4VBoW)

Andreas

PS: Please keep your replies on-list.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Build Problems on SuSE 11.0, x86_64 - cannot find -llibboost_python-mt

2009-11-21 Thread Andreas Klöckner
On Samstag 21 November 2009, Wolfgang Rosner wrote:
 Hello, Andreas Klöckner,
 
 I can't get pycuda to build on my box.
 I tried at least 10 times, try to reproduce details now.
 In the archive, there was a similar thread with a SuSE 11.1 64 bit box ,
 posted by Jonghwan Rhee
 http://article.gmane.org/gmane.comp.python.cuda/955
 but that did not really provide a solution.
 
 cannot find -llibboost_python-mt
 although the file is in place

First, all that configure.py does is edit siteconf.py--no need to rerun it 
once sitconf.py is in place.

Second, -lsomething implicitly looks for libsomething.so. No need to specify 
the lib prefix.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Build Problems on SuSE 11.0, x86_64 - cannot find -llibboost_python-mt

2009-11-21 Thread Andreas Klöckner
On Samstag 21 November 2009, Wolfgang Rosner wrote:
 OK, for me it works now, but peomple might be even more (and earlier) happy
  if  the pytools issue had been mentioned in the setup wiki.

Pytools should be installed automatically along with 'python setup.py 
install'. If it didn't: do you have any idea why?

 If you like and give me an wiki account, I'd go to share my experience.

No account required. (though you can create one yourself) Please do share.

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Build Problems on SuSE 11.0, now Pytools not found

2009-11-21 Thread Andreas Klöckner
On Samstag 21 November 2009, Wolfgang Rosner wrote:
  Pytools should be installed automatically along with 'python setup.py
  install'. If it didn't: do you have any idea why?
 
 not sure.
 
 could it be that I ran make install
 instead of python setup.py install ?

make install invokes python setup.py install. That shouldn't have been it.

 (sorry, I'm just getting used to Python, preferred perl in earlier live)
 
 first I thought it was due to the different python path structures.
 
 standard on SuSE is
 /usr/lib64/python2.5/site-packages/
 
 but the egg-laying machine seems to put stuff to
 /usr/local/lib64/python2.5/site-packages
 instead
 (maybe I could have reconfigured this, anyway)

Weird. Curious about the reasoning behind this.

 However, gl_interop.py did not run until I did
 export PYTHONPATH=/usr/local/lib64/python2.5/site-packages/
 (was PYTHONPATH= before)
 
 maybe this is since there is still the old python-opengl-2.0.1.09-224.1
 /usr/lib64/python2.5/site-packages/OpenGL/GL/ARB/
  ...with-no-vertex-buffer-in-there in the way which is caught before.
 
 But to figure it out I'm definitely lacking sufficient python experience.

There is an easy trick to find out what file path actually gets imported:

 import pytools
 pytools.__file__
'/home/andreas/research/software/pytools/pytools/__init__.py'

 hm, might give it a try.
 I think best I could offer was be to prepare an own SuSE page with my
 experience.

Sure--just add a subpage under
http://wiki.tiker.net/PyCuda/Installation/Linux
(like the one for Ubuntu)

 It all comes down to different ways and places where stuff is stored.
 But I think my approach is not the best one, in the view back it were
  better to configure new stuff so that it meets SuSE structure. Maybe.
 Well, but this might break other dependencies?
 Smells like big 'Baustelle'...
 
 So if your expectation of quality on your wiki is not too high, I'll post
  my experience there.

That's the whole point of a Wiki: information of questionable quality that 
people improve as they use it. It's a knowledge retention tool.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] autotuning

2009-11-20 Thread Andreas Klöckner
On Freitag 20 November 2009, James Bergstra wrote:
 Now that we're taking more advantage of PyCUDA's and CodePy's ability
 to generate really precise special-case code... I'm finding that we
 wind up with a lot of ambiguities about *which* generator should
 handle a given special case.  The right choice for a particular input
 structure is platform-dependent--a function of cache sizes, access
 latencies, transfer bandwidth, register counts, number of processors,
 etc, etc.  The wrong choice can carry a big performance penalty.
 
 FFTW and ATLAS get around this by self-tuning algorithms, which I
 don't understand in detail, but which generally work by trying a lot
 of generators on a lot of special cases, and then using the database
 of timings to make good choices quickly at runtime.

What has worked well for me is to try a big bunch of kernels right before 
their intended use and cache which one was fast for this special case only. 
The main delay is the compilation of all these kernels, the trial runs are all 
very quick, thanks to the GPU. There's just enough caching at each level to 
make this efficient.

 It seems like this automatic-tuning is even more important for GPU
 implementations than for CPU ones.  

That certainly echos one claim from the PyCUDA article. :) 

 Are there libraries to help with this?

First of all, since it's a thorny (and unsolved) problem, PyCUDA doesn't try 
to get involved in it. Support it--yes, involved--no. That said, I'm not aware 
of libraries that make autotunig significantly easier. Nicolas mentioned that 
he's eyeing some machine learning techniques like the ones in Milepost gcc. 
Nicolas, care to comment? Aside from that, Cray's grouped, attributed 
orthogonal search [1] sounds useful.

[1] http://iwapt.org/2009/slides/Adrian_Tate.pdf

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] trouble with pycuda-0.93.1rc2 - test_driver.py, etc.

2009-11-20 Thread Andreas Klöckner
On Freitag 20 November 2009, Janet Jacobsen wrote:
 Hi, Andreas.  I ran
 
 ldd /usr/common/usg/python/2.6.4/lib/python2.6/site-packages
 /pycuda-0.93.1rc2-py2.6-linux-x86_64.egg/pycuda/_driver.so
 
 and got
 
  libboost_python.so.1.40.0 =
  /usr/common/usg/boost/1_40_0/pool/lib/libboost_python.so.1.40.0
  (0x2b259cac6000)
  libboost_thread.so.1.40.0 =
  /usr/common/usg/boost/1_40_0/pool/lib/libboost_thread.so.1.40.0
  (0x2b259cd15000)
  libcuda.so.1 = /usr/lib64/libcuda.so.1 (0x2b259cf43000)
  libstdc++.so.6 = /usr/common/usg/gcc/4.4.2/lib64/libstdc++.so.6
  (0x2b259d3df000)
  libm.so.6 = /lib64/libm.so.6 (0x2b259d6eb000)
  libgcc_s.so.1 = /usr/common/usg/gcc/4.4.2/lib64/libgcc_s.so.1
  (0x2b259d96e000)
  libpthread.so.0 = /lib64/libpthread.so.0 (0x2b259db85000)
  libc.so.6 = /lib64/libc.so.6 (0x2b259dda)
  libutil.so.1 = /lib64/libutil.so.1 (0x2b259e0f6000)
  libdl.so.2 = /lib64/libdl.so.2 (0x2b259e2fa000)
  librt.so.1 = /lib64/librt.so.1 (0x2b259e4fe000)
  libz.so.1 = /usr/lib64/libz.so.1 (0x2b259e707000)
  /lib64/ld-linux-x86-64.so.2 (0x00316ec0)
 
 Does this help?

Hmm, yes and no. I'm starting to believe that Boost built itself thinking that 
you have an UCS4 Python, while your actual build is UCS2. To confirm that 
latter point, run the equivalent of

objdump -T /users/kloeckner/mach/x86_64/pool/lib/libpython2.6.so.1.0|grep UCS

That should tell you what the UCS'iness of your custom Python is. Then run

objdump -T /usr/common/usg/boost/1_40_0/pool/lib/libboost_python.so.1.40.0 | 
grep UCS

to establish Boost's UCS'iness. As I said, I'm suspecting that the two might 
disagree. You might want to try that against your system Python 2.4, too. 
Maybe Boost cleverly found that one and picked it up. In any case, the switch 
to look for is Py_UNICODE_SIZE in pyconfig.h.

 P.S. Sorry if this should be off-list, but email sent to
 li...@monster.tiker.net is returned to me.

On-list is the right place IMO--this creates a searchable record of problems 
and solutions. Thanks for asking though. (Btw: where'd you get that email 
address? While tiker.net is my domain, I don't recall using or having created 
that address.)

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Install PyCUDA on opensuse 11.1(x86_64)

2009-11-13 Thread Andreas Klöckner
On Donnerstag 12 November 2009, Jonghwan Rhee wrote:
 Hi there,
 
 I have tried to install pycuda on opensuse 11.1. However, when I did
 build, the following error occurred.

What version of Boost do you have, how and where was it installed?

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Install PyCUDA on opensuse 11.1(x86_64)

2009-11-13 Thread Andreas Klöckner
Check out virtualenv.

Andreas

PS: *Please* try to keep the replies on-list. Thanks.

On Freitag 13 November 2009, you wrote:
 Hi Andreas,
 
 Thanks for your help. It worked well. But another problem occurred when I
  do make install as follows.
 
 ctags -R src || true
 /usr/bin/python setup.py install
 Extracting in /tmp/tmpKkbJcX
 Now working in /tmp/tmpKkbJcX/distribute-0.6.4
 Building a Distribute egg in /home/jrhee/pycuda
 /home/jrhee/pycuda/setuptools-0.6c9-py2.6.egg-info already exists
 /home/jrhee/pool/include/boost-1_36 /boost/  python .hpp
 *** Cannot find Boost headers. Checked locations:
/home/jrhee/pool/include/boost-1_36/boost/python.hpp
 /home/jrhee/pool/lib / lib boost_python .so
 /home/jrhee/pool/lib / lib boost_python .dylib
 /home/jrhee/pool/lib / lib boost_python .lib
 /home/jrhee/pool/lib /  boost_python .so
 /home/jrhee/pool/lib /  boost_python .dylib
 /home/jrhee/pool/lib /  boost_python .lib
 *** Cannot find Boost Python library. Checked locations:
/home/jrhee/pool/lib/libboost_python.so
/home/jrhee/pool/lib/libboost_python.dylib
/home/jrhee/pool/lib/libboost_python.lib
/home/jrhee/pool/lib/boost_python.so
/home/jrhee/pool/lib/boost_python.dylib
/home/jrhee/pool/lib/boost_python.lib
 /home/jrhee/pool/lib / lib boost_thread .so
 /home/jrhee/pool/lib / lib boost_thread .dylib
 /home/jrhee/pool/lib / lib boost_thread .lib
 /usr/local/cuda /bin/  nvcc
 /usr/local/cuda/include /  cuda .h
 /usr/local/cuda/lib / lib cuda .so
 /usr/local/cuda/lib / lib cuda .dylib
 /usr/local/cuda/lib / lib cuda .lib
 /usr/local/cuda/lib /  cuda .so
 /usr/local/cuda/lib /  cuda .dylib
 /usr/local/cuda/lib /  cuda .lib
 /usr/local/cuda/lib /  cuda .so
 /usr/local/cuda/lib /  cuda .dylib
 /usr/local/cuda/lib /  cuda .lib
 /usr/local/cuda/lib /  cuda .so
 /usr/local/cuda/lib /  cuda .dylib
 /usr/local/cuda/lib /  cuda .lib
 *** Cannot find CUDA driver library. Checked locations:
/usr/local/cuda/lib/libcuda.so
/usr/local/cuda/lib/libcuda.dylib
/usr/local/cuda/lib/libcuda.lib
/usr/local/cuda/lib/cuda.so
/usr/local/cuda/lib/cuda.dylib
/usr/local/cuda/lib/cuda.lib
/usr/local/cuda/lib/cuda.so
/usr/local/cuda/lib/cuda.dylib
/usr/local/cuda/lib/cuda.lib
/usr/local/cuda/lib/cuda.so
/usr/local/cuda/lib/cuda.dylib
/usr/local/cuda/lib/cuda.lib
 *** Note that this may not be a problem as this component is often
  installed system-wide.
 running install
 Checking .pth file support in /usr/local/lib64/python2.6/site-packages/
 error: can't create or remove files in install directory
 
 The following error occurred while trying to add or remove files in the
 installation directory:
 
 [Errno 2] No such file or directory:
 '/usr/local/lib64/python2.6/site-packages/test-easy-install-24814.pth'
 
 The installation directory you specified (via --install-dir, --prefix, or
 the distutils default setting) was:
 
 /usr/local/lib64/python2.6/site-packages/
 
 This directory does not currently exist.  Please create it and try again,
  or choose a different installation directory (using the -d or
  --install-dir option).
 
 make: *** [install] Error 1
 
 
 Jong
 
 On Sat, Nov 14, 2009 at 12:51 PM, Andreas Klöckner
 
 li...@informa.tiker.netwrote:
  On Freitag 13 November 2009, Jonghwan Rhee wrote:
   Hi Andreas,
  
   Its version is boost 1.36 and it was installed at /usr/include/boost/
   by YaST package repositories.
 
  According to http://packages.opensuse-community.org, it appears that
  opensuse
  uses non-standard names for the Boost libraries. Stick
 
  BOOST_PYTHON_LIBNAME=boost_python
  BOOST_THREAD_LIBNAME=boost_thread
 
  into your siteconf.py. That should get you a step further.
 
  HTH,
  Andreas
 
  PS: Please take care to keep replies on the mailing list.
 
  ___
  PyCUDA mailing list
  PyCUDA@tiker.net
  http://tiker.net/mailman/listinfo/pycuda_tiker.net
 



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] pycuda patch for 'flat' eggs

2009-11-04 Thread Andreas Klöckner
Hi Maarten,

On Dienstag 03 November 2009, you wrote:
 i've been using your pycuda package to play with, and I really like
 it! much more productive than compiling etc..
 I have pycuda installed with --single-version-externally-managed and a
 different prefix. This causes pycuda not to find the header files.
 I've attached the diff and new compiler.py file to fix this.

Merged in release-0.93 and master.

Thanks for the patch,
Andreas

PS: Please direct stuff like this to the mailing list next time.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Pitch Linear Memory Textures

2009-10-26 Thread Andreas Klöckner
On Montag 26 Oktober 2009, Robert Manning wrote:
 PyCUDA users,
I've been trying to find how to use pitch 2D linear memory textures
 in pyCUDA and have been unsuccessful.  I've seen C code that uses
 cudaBindTexture2D and similar functions but it is not accessible (to
 my knowledge) by pyCUDA.  Any suggestions?
 
 Thanks,
 Bob Manning

See test_2d_texture() in test/test_driver.py.

HTH,
Andreas




signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Has anyone made a working parallel prefix sum/scan with pycuda?

2009-10-20 Thread Andreas Klöckner
On Montag 19 Oktober 2009, Michael Rule wrote:
 I'm convinced that I need a prefix scan that gives me access to the
 resultant prefix scanned array.
 
 so, for example, using addition, I would like a function that takes :
 1 1 1 1 1 1 1 1 1 1
 to
 0 1 2 3 4 5 6 7 8 9
 
 It seems like this data should be generated as an intermeidiate step in
 executing a ReductionKernel. I have not been able to figure out how this
 data is accessed by browsing the GPUArray documentation. Am I missing
 something obvious ?

Parallel Prefix Scan is presently not implemented in PyCUDA. While reduction 
is related, the scan is actually a somewhat different animal. PScan would be a 
most welcome addition to PyCUDA, however. Mark Harris has written a good 
introduction on how to implement it:

http://is.gd/4rXq0

If you decide to follow Mark's guide, almost half your work is already done 
for you--reduction occurs as part of the prefix scan, so you'll be able to 
recycle a fair bit of code.

HTH,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


[PyCUDA] On Python 2.6.3 / distribute / setuptools

2009-10-15 Thread Andreas Klöckner
Hi all,

--
This is relevant to you if you are using Python 2.6.3 and you are getting 
errors of the sort:

/usr/local/lib/python2.6/dist-packages/setuptools-0.6c9-
py2.6.egg/setuptools/command/build_ext.py,
line 85, in get_ext_filename
KeyError: '_cl'
--

It seems Python 2.6.3 broke every C/C++ extension on the planet that was 
shipped using setuptools (which includes PyCUDA, PyOpenCL, and many more of my 
packages.) Thanks for your patience as I've worked through this mess, and to 
both Allan and Christine, I'm sorry you've had to deal with this, and thanks 
to Allan for pointing me in the right direction.

To make a long story short, I've switched my packages (including PyCUDA) to 
use distribute instead of setuptools. All these changes are now in git. I'm 
not sure this will help if a 2.6.3 user already has setuptools installed, but 
I hope it will at least not make any other case worse. All in all, this seems 
like the least bad option given that I expect distribute to be the way of the 
future.

Before I unleash this change full-scale, I would like it to get some testing. 
For this purpose, I've created a PyCUDA release candidate package, here:

http://pypi.python.org/pypi/pycuda/0.93.1rc1

PLEASE TEST THIS, and speak up if you do--both positive and negative comments 
are much appreciated.

Andreas

PS: Once I have reasonable confirmation that this works for PyCUDA, I'll also 
release updated versions of PyOpenCL, meshpy, boostmpi, pyublas,  The 
relevant changes are *already in git* if you'd like to try them now.

PPS: Deciding in favor of distribute and against the promised setuptools 
update was based on two factors:
- Primarily, distribute makes a fix for the 2.6.3 issue available *now*.
- Secondarily, I personally disliked the behavior of PJE, the author of 
setuptools, in response to the current mess.


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] [Hedge] On Python 2.6.3 / distribute / setuptools

2009-10-15 Thread Andreas Klöckner
On Donnerstag 15 Oktober 2009, Andreas Klöckner wrote:
 Hi all,
 
 --
 This is relevant to you if you are using Python 2.6.3 and you are getting
 errors of the sort:
 
 /usr/local/lib/python2.6/dist-packages/setuptools-0.6c9-
 py2.6.egg/setuptools/command/build_ext.py,
 line 85, in get_ext_filename
 KeyError: '_cl'
 --

A quick addition: If you are already encountering this error, you need to 
*remove* setuptools before the fix will work for you.

That means that if you do import setuptools on the Python shell and it 
succeeds, type setuptools.__file__ to see where it is installed and get rid 
of it, then start over. (After the fix has worked, it will say somehting with 
distribute in the path for the setuptools.__file__. That's fine.)

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] [Hedge] On Python 2.6.3 / distribute / setuptools

2009-10-15 Thread Andreas Klöckner
On Donnerstag 15 Oktober 2009, Darcoux Christine wrote:
 Hi Andreas
 
  0.93.1rc1 seems to works for me, except that I had to download a
  distribute_setup.py file. Could you include that file in the source
  tarball ? I think this is the usual way to work with Distribute, since
  the documentation say :

Whoops. Good point. This should be fixed in 0.93.1rc2, here:
http://pypi.python.org/pypi/pycuda/0.93.1rc2

  I ran the examples/demo.py with success, but the transpose demo
  crashed. I assume this is not related to the use of Distribute, but
  here is the trace :
 
   File transpose.py, line 205, in module
 run_benchmark()
   File transpose.py, line 165, in run_benchmark
 target = gpuarray.empty((size, size), dtype=source.dtype)
   File /usr/lib/python2.6/site-packages/pycuda/gpuarray.py, line 81,
  in __init__
 self.gpudata = self.allocator(self.size * self.dtype.itemsize)
  pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

Well, I'm not sure transpose.py adapts to the amount of memory you have. It 
works on my ~900M card. If you'd like to prepare a fix, I'd certainly merge 
it.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] 2d scattered data gridding

2009-10-06 Thread Andreas Klöckner
Hi Roberto,

On Freitag 02 Oktober 2009, Roberto Vidmar wrote:
   I wonder if it is possible to use PyCUDA to grid on a two dimensional
 regular grid xyz scattered data. Our datasets are usually quite large
 (some millions of points) .
 
 Many thanks for any help in this topic.

As usual, if something is possible with CUDA in general, it's also possible 
with PyCUDA. In this specific case, I'm not sure what you mean by gridding--
making a grid-based histogram, binning, or perhaps something entirely 
different? Nonetheless, it seems likely that what you want can be (and likely 
has been) done with CUDA.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


[PyCUDA] Nvidia GTC: PyCUDA talk, Meetup

2009-09-23 Thread Andreas Klöckner
Hi all,

If you are attending Nvidia's GPU Technology Conference next week, there are 
two things I'd like to point out:

- I'll be giving a talk about PyCUDA on Friday, October 2 at 2pm, where I'll 
both introduce PyCUDA and talk about some exciting new developments. The talk 
will be 50 minutes in length, and I'd be happy to see you there.

- Also, I'd like to propose a PyCUDA meetup on Thursday, October 1 at noon. 
(ie. lunchtime) I'll be hanging out by the Terrace seminar room around that 
time. I'm looking forward to meeting some of you in person.

See you next week,
Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Nvidia GTC: PyCUDA talk, Meetup

2009-09-23 Thread Andreas Klöckner
On Mittwoch 23 September 2009, Andreas Klöckner wrote:
 Hi all,
 
 If you are attending Nvidia's GPU Technology Conference next week, there
  are two things I'd like to point out:
 
 - I'll be giving a talk about PyCUDA on Friday, October 2 at 2pm, where
  I'll both introduce PyCUDA and talk about some exciting new developments.
  The talk will be 50 minutes in length, and I'd be happy to see you there.

Whoops--that's wrong. I just realized the talk is at 1pm on Friday. Sorry for 
the noise.

 - Also, I'd like to propose a PyCUDA meetup on Thursday, October 1 at noon.
 (ie. lunchtime) I'll be hanging out by the Terrace seminar room around
  that time. I'm looking forward to meeting some of you in person.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] problem with pycuda._driver.pyd

2009-09-17 Thread Andreas Klöckner
On Donnerstag 17 September 2009, mailboxalpha wrote:
  I looked for DLL files used in _driver.pyd and one of them is named
  nvcuda.dll.  There is no such file on my machine.  Perhaps that is the DLL
  file that could not be found. The required boost dlls have been copied to
  windows\system32 directory and the boost lib  directory has been added to
  the system path.

Can you please check which DLL the CUDA examples require? There will be one 
that has the runtime interface, which will likely be called something with 
cudart. You don't care about that one. Instead, that DLL in turn requires 
the driver interface, and that's the DLL name we need.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Installing PyCuda 0.93 on CentOs 5.3

2009-08-17 Thread Andreas Klöckner
On Montag 17 August 2009, Christian Quaia wrote:
 Hi.
  I've been trying to install pycuda on my centos 5.3 box, but I haven't had
 much success. I managed to install boost (1.39) as per instructions, but
 when I build pycuda I get the following error:

 building '_driver' extension
 gcc -pthread -fno-strict-aliasing -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
 -mtune=generic -D_GNU_SOURCE -fPIC -O3 -DNDEBUG -fPIC -Isrc/cpp
 -I/usr/include/boost-1_39 -I/usr/local/cuda/include
 -I/usr/lib64/python2.4/site-packages/numpy/core/include
 -I/usr/include/python2.4 -c src/wrapper/wrap_cudadrv.cpp -o
 build/temp.linux-x86_64-2.4/src/wrapper/wrap_cudadrv.o

 src/wrapper/wrap_cudadrv.cpp: In function
 ‘intunnamed::function_get_lmem(const cuda::function)’:
 src/wrapper/wrap_cudadrv.cpp:165: error: ‘PyErr_WarnEx’ was not
 declared in this scope
 ...
 error: command 'gcc' failed with exit status 1


 I looked around for a solution, but I couldn't find any. I'm using
 Python 2.4.3,
 and my default gcc version is 4.1.2 (although I had to compile boost
 using gcc43)

Try the git version, PyErr_WarnEx is not referenced any more.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] Installing PyCuda 0.93 on CentOs 5.3

2009-08-17 Thread Andreas Klöckner
That says that you're linking against the boost you built with gcc 4.3--
rebuild boost with 4.1, that you should get you a step further.

Andreas

On Dienstag 18 August 2009, Christian Quaia wrote:
 Thanks Andreas.

 Sorry, I should have tried that before... Now the build and install
 work. However, when I run the tests I get another error:

 Traceback (most recent call last):
   File ./test/test_driver.py, line 475, in ?
 import pycuda.autoinit
   File
 /usr/lib64/python2.4/site-packages/pycuda-0.94beta-py2.4-linux-x86_64.egg/
pycuda/autoinit.py, line 1, in ?
 import pycuda.driver as cuda
   File
 /usr/lib64/python2.4/site-packages/pycuda-0.94beta-py2.4-linux-x86_64.egg/
pycuda/driver.py, line 1, in ?
 from _driver import *
 ImportError: /usr/lib64/libboost_python-gcc43-mt-1_39.so.1.39.0:
 undefined symbol:
 _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3
_l


 Thanks again for your help,
 Christian


 On Mon, Aug 17, 2009 at 5:41 PM, Andreas

 Klöcknerli...@informa.tiker.net wrote:
  On Montag 17 August 2009, Christian Quaia wrote:
  Thanks Andreas.
 I got the git version, and things went a bit better, but I still have
  a problem.
 
  When I build pycuda now I get this error:
 
  /usr/include/boost-1_39/boost/type_traits/detail/cv_traits_impl.hpp:37:
  internal compiler error: in make_rtl_for_nonlocal_decl, at
  cp/decl.c:5067
 
 
  This is due to a bug in gcc 4.1.2, which was fixed in later versions.
  For this very reason I
  had to compile boost using gcc43, which is also installed on my
  machine in parallel to gcc.
  Is there a simple way, like for boost, to force pycuda to be built
  using gcc43 as a compiler?
 
  Not easily, as Python should be built with a matching compiler.
 
  But see this here for gcc 4.1 help:
  http://is.gd/2lCd7
 
  Andreas
 
  ___
  PyCUDA mailing list
  PyCUDA@tiker.net
  http://tiker.net/mailman/listinfo/pycuda_tiker.net

 ___
 PyCUDA mailing list
 PyCUDA@tiker.net
 http://tiker.net/mailman/listinfo/pycuda_tiker.net



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] a way to probe what globals, especially constant arrays and texture refs are defined in a kernel?

2009-08-12 Thread Andreas Klöckner
On Mittwoch 12 August 2009, Andrew Wagner wrote:
 Hi-

 Is there a way to get a list of the global variables, especially
 constant arrays and texture refs that are defined in a kernel?

 I'm generating a pycuda.driver.Module from a template, and the storage
 of various kernel inputs depends on the template parameters.

 It would be convenient for code using a kernel generated this way to
 have some way of figuring out what global variables are defined in the
 kernel, and whether they are globals, constants, or texrefs.

 Maybe in a future version of pycuda it would be nice to replace (or
 provide an alternative to) the accessor functions:

 pycuda.driver.Module.get_global
 pycuda.driver.Module.get_function
 pycuda.driver.Module.get_texref

 that consists of having member variables:

 pycuda.driver.Module.globals
 pycuda.driver.Module.functions
 pycuda.driver.Module.constants
 pycuda.driver.Module.texrefs

 that are already initialized to dictionaries with the name of the
 variable as the key, and the handle  (or maybe a (handle, size) tuple)
 as the value.

 or maybe have a single member variable pycuda.driver.Module.globals
 that is a dictionary with variable names as keys, and a (type, handle,
 size) tuple or something similar.

 If I at least have the name of the variable I think I can deduce if
 the variable is defined as a __constant__ array  by wrapping
 pycuda.driver.Module.get_global in a try: statement, but that's rather
 un-pythonic

 Or perhaps I'm misunderstanding something and the Module.get_*
 functions are forced on us by the CUDA  API?

Sorry, no way to do that by the CUDA API. One could potentially parse the 
CUBIN file, but that's rather fragile and not something that PyCUDA engages 
in, so far.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] [PyCuda] Installation wiki updated with Windows Vista 64 bit install with Visual Studio 2008

2009-08-12 Thread Andreas Klöckner
On Mittwoch 12 August 2009, wtftc wrote:
 I've updated the wiki with my configuration of Windows Vista 64 bit with
 Visual Studio 2008. To use it, your entire python stack must also be 64
 bit. The build was x64, also known as amd64.

Thanks!

 I also have a question: is it possible to statically compile all embedded
 kernels in my code with pycuda? Deploying a program with pycuda widely with
 because it requires the cuda and c++ build tools, which are heavy. It would
 be nice to have an option to generate a library at build time that could
 then be packaged and installed without having to do the heavy c lifting.

That's very possible--this would amount to preseeding the PyCUDA compiler 
cache. I'd certainly merge a patch that implements this. The PyCUDA compiler 
cache logic is ~200 lines, so this should be easy to add.

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] non-egg install broken with 0.93

2009-08-12 Thread Andreas Klöckner
On Mittwoch 12 August 2009, Ryan May wrote:
 Hi,

 I was trying to install pycuda today from source (in advance of the Scipy
 tutorial!), and have noticed a problem if I don't use eggs to install.  If
 I use:

 python setup.py install --single-version-externally-managed --root=/

 I get the pycuda header file installed under usr/include/cuda.  This breaks
 the logic in compiler._find_pycuda_include_path().

 I personally avoid eggs, so this creates a problem.  Also, many linux
 package managers (including Gentoo, my own distro) avoid eggs, and I know
 for a fact that Gentoo uses this same method to install packages.  I've
 hacked my own compiler.py to work, but I'm not sure what a good solution
 really would be.  Gentoo has 0.92 packaged, but I don't think the header
 was used in that version and thus didn't present any problems.

I've committed a (somewhat hacky) fix: Automatically check /usr and /usr/local 
on Linux. Let me know if that works for you. 

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] NVIDIA CUDA Visual Profiler?

2009-07-31 Thread Andreas Klöckner
On Freitag 31 Juli 2009, Ahmed Fasih wrote:
 Hi, I'm very surprised that google isn't turning up something about
 this topic because I thought it's been previously discussed, so my
 apologies if it has.

 I'm trying the NVIDIA CUDA Visual Profiler (v 2.2.05) in Windows XP
 with a fairly recent PyCUDA git, on CUDA 2.2
 (pycuda.driver.get_driver_version() returns 2020).

 I provide the Visual Profiler with a Windows batch file that calls
 python my_pycuda_script.py -some -flags, but the Visual Profiler
 (after running the script 4 times) only reports two methods,
 memcopy. All other counters are zero (so it doesn't display them in
 the table). Manipulating the counters enabled doesn't change this.

 Any assistance would be much appreciated. My application runs only
 ~10% faster on a Tesla C1060 than a G80 Quadro (despite having twice
 as many MPs) so I'm hoping the profiler will help me understand why.

On Linux, I've had good success with just using the profiler from the command 
line:

http://webapp.dam.brown.edu/wiki/SciComp/CudaProfiling

Every one of my attempts to achieve the same thing using the visual profiler 
has ended in tears so far. I'm not sure if the command line way of doing 
things works in Windows, but I'd imagine so.

Once you figure out what's up, please add an FAQ entry!

Andreas


signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] casting arguments to memset to unsigned int ?

2009-07-03 Thread Andreas Klöckner
On Mittwoch 01 Juli 2009, Andreas Klöckner wrote:
 Would you mind adding FAQ items for these two?
 http://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions

Thanks for writing the FAQ item! FYI--I've slightly reworked and expanded it.

http://is.gd/1mMDg

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


Re: [PyCUDA] [PyCuda] pyCUDA Windows Installation

2009-07-02 Thread Andreas Klöckner
On Donnerstag 02 Juli 2009, faberx wrote:
 Dear all! You can install pycuda on windows xp! Please look at:
 http://wiki.tiker.net/PyCuda/Installation/Windows
 http://wiki.tiker.net/PyCuda/Installation/Windows

Thanks for writing this up!

Andreas



signature.asc
Description: This is a digitally signed message part.
___
PyCUDA mailing list
PyCUDA@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


  1   2   >