Hi Imran, Thank you for the info, I'll fix the code - python 2.5 is still widely used. As for the ATI drivers, I thought the latest release version of Stream (2.01) supports OpenCL. I wonder if the terrible performance (these tests run faster on my GF9600) and this deadlock issue are really caused by the drivers you use... I was actually going to order server with ATI GPU for my simulations (because of their advertised Gflops numbers for both single and double precision), but I am starting to reconsider this decision now.
Best regards, Bogdan On Thu, Mar 25, 2010 at 12:13 PM, Imran Haque <iha...@stanford.edu> wrote: > Hi Bogdan, > > I also had to do the following to get the test to run: > > - kernel.py:45: change "except AssertionError as e:" to "except > AssertionError:" > - plan.py:4: add getRadixArray to import list from .kernel_helpers > > I was able to get the following pair of results, but then the test hung. The > machine has prerelease ATI drivers installed, so that might be the issue. > However, I've also encountered cases in my own work with code that is > formally incorrect (e.g., barriers that are not uniformly executed) on which > the Nvidia runtime does not deadlock but the ATI runtime does, so it might > be worth checking to see if you have any situations like that. > > $ python test_performance.py > Running performance tests... > * cl, (16,), batch 131072: 1.85770988464 ms, 22.5778203296 GFLOPS > * cl, (1024,), batch 2048: 13.0976915359 ms, 8.00580771903 GFLOPS > > Cheers, > > Imran > > Bogdan Opanchuk wrote: >> >> Hello Imran, >> >> kernel.py requires patching too: >> - from .kernel_helpers import * >> + from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo, >> getPadding, getSharedMemorySize >> >> I hope this will be enough. Sorry for the inconvenience, I'm going to >> commit it in the repository. I need to add some version check too, >> because there will definitely be other bugs on Python 2.4, which is >> still used by some Linux distros ) >> >> Best regards, >> Bogdan >> >> On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <manti...@gmail.com> >> wrote: >> >>> >>> Hello Imran, >>> >>> I tested it only on 2.6, so it can be the case. Thanks for the bug >>> report though, this sort of compatibility is easy to add. Can you >>> please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel, >>> X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line? >>> >>> Best regards, >>> Bogdan >>> >>> On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <iha...@stanford.edu> >>> wrote: >>> >>>> >>>> Didn't work - does it require newer than Python 2.5? >>>> >>>> $ python test_performance.py >>>> Running performance tests... >>>> Traceback (most recent call last): >>>> File "test_performance.py", line 57, in <module> >>>> run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE) >>>> File "test_performance.py", line 52, in run >>>> testPerformance(ctx, shape, buffer_size) >>>> File "test_performance.py", line 22, in testPerformance >>>> plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True) >>>> File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in >>>> getPlan >>>> import pyfft.cl >>>> File >>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py", >>>> line 9, in <module> >>>> from .plan import FFTPlan >>>> File >>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py", >>>> line 3 >>>> from .kernel import * >>>> SyntaxError: 'import *' not allowed with 'from .' >>>> >>>> >>>> Bogdan Opanchuk wrote: >>>> >>>>> >>>>> Hello Imran, >>>>> >>>>> (sorry, forgot to add maillist to CC) >>>>> >>>>> Thank you for prompt reply, results from 5870 are interesting too. If >>>>> you have pyopencl installed, just run test_performance.py from >>>>> pyfft_test folder, located in pyfft package. It will print the results >>>>> in stdout. >>>>> >>>>> Best regards, >>>>> Bogdan. >>>>> >>>>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu> >>>>> wrote: >>>>> >>>>> >>>>>> >>>>>> Hi Bogdan, >>>>>> >>>>>> I have access to a Radeon 5870, but it's installed in a slow host >>>>>> machine >>>>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a >>>>>> test >>>>>> for you if you can send along a quick test case. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Imran >>>>>> >>>>>> Bogdan Opanchuk wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> By the way, if it is not too much to ask: if anybody has access to >>>>>>> ATI >>>>>>> 59** series card and/or GTX 295 - could you please run performance >>>>>>> tests from the module (pyfft_test/test_performance.py) and post the >>>>>>> results here? I suspect that the poor performance in case of OpenCL >>>>>>> can be (partially) caused by nVidia drivers. >>>>>>> >>>>>>> Thank you in advance. >>>>>>> >>>>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk >>>>>>> <manti...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support, >>>>>>>> so it is called just pyfft now (and it sort of resolves the question >>>>>>>> about including it to PyCuda distribution). >>>>>>>> >>>>>>>> At the moment, the most annoying (me, at least) things are: >>>>>>>> 1. OpenCL performance tests show up to 6 times slower speed as >>>>>>>> compared to Cuda. Unfortunately, I still can't find the reason. >>>>>>>> (The interesting thing is that PyOpenCL is still noticeably faster >>>>>>>> than original Apple's C program with the same FFT algorithm). >>>>>>>> 2. I tried to support different ways of using plans, including >>>>>>>> precreated contexts, streams/queues and asynchronous execution. This >>>>>>>> resulted in quite messy interface. Any suggestions about making it >>>>>>>> more clear are welcome. >>>>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum >>>>>>>> allowed by the number of used registers. Resulting occupancy in Cuda >>>>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile >>>>>>>> kernels with different block sizes in order to find maximum >>>>>>>> occupancy, >>>>>>>> this makes kernels even slower. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Bogdan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> PyCUDA mailing list >>>>>>> pyc...@host304.hostmonster.com >>>>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net >>>>>>> >>>>>>> >>>>>>> > _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net