Re: [PyCUDA] Pycudafft becomes Pyfft

Bogdan Opanchuk Wed, 24 Mar 2010 18:26:51 -0700

Hi Imran,

Thank you for the info, I'll fix the code - python 2.5 is still widely
used. As for the ATI drivers, I thought the latest release version of
Stream (2.01) supports OpenCL. I wonder if the terrible performance
(these tests run faster on my GF9600) and this deadlock issue are
really caused by the drivers you use... I was actually going to order
server with ATI GPU for my simulations (because of their advertised
Gflops numbers for both single and double precision), but I am
starting to reconsider this decision now.


Best regards,
Bogdan

On Thu, Mar 25, 2010 at 12:13 PM, Imran Haque <iha...@stanford.edu> wrote:
> Hi Bogdan,
>
> I also had to do the following to get the test to run:
>
>   - kernel.py:45: change "except AssertionError as e:" to "except
> AssertionError:"
>   - plan.py:4: add getRadixArray to import list from .kernel_helpers
>
> I was able to get the following pair of results, but then the test hung. The
> machine has prerelease ATI drivers installed, so that might be the issue.
> However, I've also encountered cases in my own work with code that is
> formally incorrect (e.g., barriers that are not uniformly executed) on which
> the Nvidia runtime does not deadlock but the ATI runtime does, so it might
> be worth checking to see if you have any situations like that.
>
> $ python test_performance.py
> Running performance tests...
> * cl, (16,), batch 131072: 1.85770988464 ms, 22.5778203296 GFLOPS
> * cl, (1024,), batch 2048: 13.0976915359 ms, 8.00580771903 GFLOPS
>
> Cheers,
>
> Imran
>
> Bogdan Opanchuk wrote:
>>
>> Hello Imran,
>>
>> kernel.py requires patching too:
>> - from .kernel_helpers import *
>> + from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo,
>> getPadding, getSharedMemorySize
>>
>> I hope this will be enough. Sorry for the inconvenience, I'm going to
>> commit it in the repository. I need to add some version check too,
>> because there will definitely be other bugs on Python 2.4, which is
>> still used by some Linux distros )
>>
>> Best regards,
>> Bogdan
>>
>> On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <manti...@gmail.com>
>> wrote:
>>
>>>
>>> Hello Imran,
>>>
>>> I tested it only on 2.6, so it can be the case. Thanks for the bug
>>> report though, this sort of compatibility is easy to add. Can you
>>> please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel,
>>> X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line?
>>>
>>> Best regards,
>>> Bogdan
>>>
>>> On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <iha...@stanford.edu>
>>> wrote:
>>>
>>>>
>>>> Didn't work - does it require newer than Python 2.5?
>>>>
>>>> $ python test_performance.py
>>>> Running performance tests...
>>>> Traceback (most recent call last):
>>>>  File "test_performance.py", line 57, in <module>
>>>>  run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE)
>>>>  File "test_performance.py", line 52, in run
>>>>  testPerformance(ctx, shape, buffer_size)
>>>>  File "test_performance.py", line 22, in testPerformance
>>>>  plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True)
>>>>  File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in
>>>> getPlan
>>>>  import pyfft.cl
>>>>  File
>>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py",
>>>> line 9, in <module>
>>>>  from .plan import FFTPlan
>>>>  File
>>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py",
>>>> line 3
>>>>  from .kernel import *
>>>> SyntaxError: 'import *' not allowed with 'from .'
>>>>
>>>>
>>>> Bogdan Opanchuk wrote:
>>>>
>>>>>
>>>>> Hello Imran,
>>>>>
>>>>> (sorry, forgot to add maillist to CC)
>>>>>
>>>>> Thank you for prompt reply, results from 5870 are interesting too. If
>>>>> you have pyopencl installed, just run test_performance.py from
>>>>> pyfft_test folder, located in pyfft package. It will print the results
>>>>> in stdout.
>>>>>
>>>>> Best regards,
>>>>> Bogdan.
>>>>>
>>>>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <iha...@stanford.edu>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Bogdan,
>>>>>>
>>>>>> I have access to a Radeon 5870, but it's installed in a slow host
>>>>>> machine
>>>>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a
>>>>>> test
>>>>>> for you if you can send along a quick test case.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Imran
>>>>>>
>>>>>> Bogdan Opanchuk wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> By the way, if it is not too much to ask: if anybody has access to
>>>>>>> ATI
>>>>>>> 59** series card and/or GTX 295 - could you please run performance
>>>>>>> tests from the module (pyfft_test/test_performance.py) and post the
>>>>>>> results here? I suspect that the poor performance in case of OpenCL
>>>>>>> can be (partially) caused by nVidia drivers.
>>>>>>>
>>>>>>> Thank you in advance.
>>>>>>>
>>>>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk
>>>>>>> <manti...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>>>>>>> so it is called just pyfft now (and it sort of resolves the question
>>>>>>>> about including it to PyCuda distribution).
>>>>>>>>
>>>>>>>> At the moment, the most annoying (me, at least)  things are:
>>>>>>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>>>>>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>>>>>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>>>>>>> than original Apple's C program with the same FFT algorithm).
>>>>>>>> 2. I tried to support different ways of using plans, including
>>>>>>>> precreated contexts, streams/queues and asynchronous execution. This
>>>>>>>> resulted in quite messy interface. Any suggestions about making it
>>>>>>>> more clear are welcome.
>>>>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>>>>>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>>>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>>>>>>> kernels with different block sizes in order to find maximum
>>>>>>>> occupancy,
>>>>>>>> this makes kernels even slower.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Bogdan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> PyCUDA mailing list
>>>>>>> pyc...@host304.hostmonster.com
>>>>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>>>>>>>
>>>>>>>
>>>>>>>
>

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] Pycudafft becomes Pyfft

Reply via email to