Ah, got you... also it is worth noting that in certain circumstances you can save the latency associated with a PCI bus transfer if you have your GPU on the same memory bus as your CPU as described below...
http://forums.arm.com/index.php?/topic/16594-mali-t604-opencl-shared-memory-size/
this is usually the largest source of latency in a hybrid application so it will be interesting to compare some portable OpenCL codes in this context (if we can find any!)... sorry for going a bit OT... :)

At 07:48 PM 3/13/2013, Shankar Viswanathan wrote:
On Wed, Mar 13, 2013 at 2:58 PM, Kurt Keville <[email protected]> wrote:
> one thing I wasn't sure of in Anand's article... he said the ARM
> Cortex-A15 was capable of addressing 16 GB but the A9 was only capable of
> addressing 4GB... they are both ARMv7 so I don't think that is correct...

The Cortex A15 is based on the ARMv7A architecture and introduced the
Large Physical Address Extensions (LPAE):
http://www.arm.com/products/processors/cortex-a/cortex-a15.php
http://www.arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf

Each process is still limited to a 32-bit virtual address space, but
you can have a lot more physical address space to limit page swapping
to disk in virtualized environments.

Regarding Federico's original question, I agree that the choice
completely comes down to the workload (my personal bias
notwithstanding).

-Shankar

_______________________________________________
Hardwarehacking mailing list
[email protected]
http://lists.blu.org/mailman/listinfo/hardwarehacking

Reply via email to