Thanks for the support! Something I want to stress one more time is that at
the moment I am interested in a small prototype to understand design
tradeoffs and not something to immediately propose for inclusion in the
main branch "tomorrow".

Example of the questions I would like to answer are:

How big needs to be the region to achieve speedup?
How the speedup varies changing CLA parameters (number of columns,
dendrites, etc)?
Can irregularity in control flow and memory access pattern completely
disrupt performance on GPUs?
Can we propose some hardware changes to current GPUs to better support CLA?
If TP and SP run completely on the GPU do we see a speedup of  2x, 10x, 20x
or 50x?
If only SP runs on the GPU do we see speedup at all?

Once we have a simple C/C++ code we can move across openMP, openCL, CUDA
and openACC to see how things change from different GPUs (accelerators) and
different Multi-cores.
I am not sure if these types of studies have been already done. Numenta?

BTW are there any voluntaries that would like to help?



On Fri, Aug 23, 2013 at 2:04 PM, Francisco Webber <[email protected]> wrote:

> Hi all,
> A possible way to come around portability and other issues, you could pack
> just the computationally expensive code into a "slim" library that
> implements serial code or GPU code if the hardware is available.
> In past projects I noted that multi-core teams and GPU teams produce both
> good solutions if they work competitively and cooperatively on the same
> algorithms.
>
> Francisco
>
> On 23.08.2013, at 22:57, Tim McNamara wrote:
>
> On 24 August 2013 08:25, Oreste Villa <[email protected]> wrote:
>
>> Tim,
>>
>> I partially agree with you, in the sense that if you are not careful
>> portability can be an issue. Having said that I have the following comments.
>>
>> 1) We are talking about a small prototype for the CLA algorithm on GPU
>> (used to understand potential benefits of the approach) and not moving
>> NuPIC tomorrow to GPU X.
>> 2) I plan to make several tests (starting with CUDA) but ultimately using
>> OpenCL or openACC which support multiple architectures (multi-core CPUs and
>> GPU of different vendors).
>> 3) I would really like to work on MPI on the parallelism at larger scale,
>> but I have very limited bandwidth and unfortunately I can't work on both
>> now (I am glad to help and advise if there people that want to try
>> immediately)
>>
>> Oreste
>>
>
> I don't want to get in the way of good work! CUDA & OpenCL are very
> effective technologies and I am actually very supportive. If you are happy
> to invest your time, then that seems like a great way forward. OpenACC
> directives could be very nice to include, as they really are device
> independent.
>
> My concerns are:
>
>  - decrease portability, as mentioned earlier
>  - increasing the barrier to entry to get up and running, e.g. build the
> system, users will have more documentation to read and compiler flags to
> learn
>  - code readability, more tokens makes code harder to grok
>
> It sounds like you have a lot of enthusiasm though. Your work is likely to
> benefit the whole community. Good luck :)
>  _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to